Home GnuPG

sha3: Add x86-64 AVX512 accelerated implementation

Description

sha3: Add x86-64 AVX512 accelerated implementation

* LICENSES: Add 'cipher/keccak-amd64-avx512.S'.
* configure.ac: Add 'keccak-amd64-avx512.lo'.
* cipher/Makefile.am: Add 'keccak-amd64-avx512.S'.
* cipher/keccak-amd64-avx512.S: New.
* cipher/keccak.c (USE_64BIT_AVX512, ASM_FUNC_ABI): New.
[USE_64BIT_AVX512] (_gcry_keccak_f1600_state_permute64_avx512)
(_gcry_keccak_absorb_blocks_avx512, keccak_f1600_state_permute64_avx512)
(keccak_absorb_lanes64_avx512, keccak_avx512_64_ops): New.
(keccak_init) [USE_64BIT_AVX512]: Enable x86-64 AVX512 implementation
if supported by HW features.

Benchmark on Intel Core i3-1115G4 (tigerlake):

Before (BMI2 instructions):

|  nanosecs/byte   mebibytes/sec   cycles/byte  auto Mhz

SHA3-224 | 1.77 ns/B 540.3 MiB/s 7.22 c/B 4088
SHA3-256 | 1.86 ns/B 514.0 MiB/s 7.59 c/B 4089
SHA3-384 | 2.43 ns/B 393.1 MiB/s 9.92 c/B 4089
SHA3-512 | 3.49 ns/B 273.2 MiB/s 14.27 c/B 4088
SHAKE128 | 1.52 ns/B 629.1 MiB/s 6.20 c/B 4089
SHAKE256 | 1.86 ns/B 511.6 MiB/s 7.62 c/B 4089

After (~33% faster):

|  nanosecs/byte   mebibytes/sec   cycles/byte  auto Mhz

SHA3-224 | 1.32 ns/B 721.8 MiB/s 5.40 c/B 4089
SHA3-256 | 1.40 ns/B 681.7 MiB/s 5.72 c/B 4089
SHA3-384 | 1.83 ns/B 522.5 MiB/s 7.46 c/B 4089
SHA3-512 | 2.63 ns/B 362.1 MiB/s 10.77 c/B 4088
SHAKE128 | 1.13 ns/B 840.4 MiB/s 4.64 c/B 4089
SHAKE256 | 1.40 ns/B 682.1 MiB/s 5.72 c/B 4089

  • Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>

Details

Provenance
jukiviliAuthored on Jul 21 2022, 10:14 AM
Parents
rCdca0bd133dd0: sm4-arm-sve-ce: use 32 parallel blocks for XTS and CTR32LE
Branches
Unknown
Tags
Unknown