Home GnuPG

blake2: avoid AVX/AVX2/AVX512 when CPU has high vector inst latency

Description

blake2: avoid AVX/AVX2/AVX512 when CPU has high vector inst latency

* cipher/blake2.c (blake2b_init_ctx, blake2s_init_ctx): Disable
AVX/AVX2/AVX512 implementation if x86 CPU prefers GPR implementation
over scalar integer vector.
* src/hwf-common.h (hwf_x86_cpu_details)
(_gcry_hwf_x86_cpu_details): New.
* src/hwf-x86.c (x86_cpu_details, x86_hw_features)
(x86_detect_done, _gcry_hwf_x86_cpu_details): New.
(detect_x86_gnuc): Detect Zen5 and add 'cpu_details'.
(_gcry_hwf_detect_x86): Add 'x86_cpu_details' setup.

Blake2s/Blake2b AVX/AVX2/AVX512 implementations are slower than
generic C implementation if CPU has integer vector latency higher
than 1 (for example, AMD Zen5 has int-vector latency of 2) and powerful
GPR execution. Therefore use generic C implementation for Blake2
on Zen5.

Generic C with AMD Zen5:

|  nanosecs/byte   mebibytes/sec   cycles/byte  auto Mhz

BLAKE2B_512 | 0.473 ns/B 2016 MiB/s 2.72 c/B 5750
BLAKE2S_256 | 0.798 ns/B 1195 MiB/s 4.59 c/B 5750

AVX512 with AMD Zen5:

|  nanosecs/byte   mebibytes/sec   cycles/byte  auto Mhz

BLAKE2B_512 | 0.923 ns/B 1033 MiB/s 5.31 c/B 5750
BLAKE2S_256 | 1.42 ns/B 672.4 MiB/s 8.15 c/B 5749

  • Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>

Details

Provenance
jukiviliAuthored on Sun, Dec 21, 5:15 PM
Parents
rC8b538a8c7669: camellia-gfni-avx512: add 1-block constant-time implementation
Branches
Unknown
Tags
Unknown