ghash|polyval: add x86_64 VPCLMUL/AVX512 accelerated implementation
* cipher/cipher-gcm-intel-pclmul.c (GCM_INTEL_USE_VPCLMUL_AVX512) (GCM_INTEL_AGGR32_TABLE_INITIALIZED): New. (ghash_setup_aggr16_avx2): Store H16 for aggr32 setup. [GCM_USE_INTEL_VPCLMUL_AVX512] (GFMUL_AGGR32_ASM_VPCMUL_AVX512) (gfmul_vpclmul_avx512_aggr32, gfmul_vpclmul_avx512_aggr32_le) (gfmul_pclmul_avx512, gcm_lsh_avx512, load_h1h4_to_zmm1) (ghash_setup_aggr8_avx512, ghash_setup_aggr16_avx512) (ghash_setup_aggr32_avx512, swap128b_perm): New. (_gcry_ghash_setup_intel_pclmul) [GCM_USE_INTEL_VPCLMUL_AVX512]: Enable AVX512 implementation based on HW features. (_gcry_ghash_intel_pclmul, _gcry_polyval_intel_pclmul): Add VPCLMUL/AVX512 code path; Small tweaks to VPCLMUL/AVX2 code path; Tweaks on register clearing.
Patch adds VPCLMUL/AVX512 accelerated implementation for GHASH (GCM) and
POLYVAL (GCM-SIV).
Benchmark on Intel Core i3-1115G4:
Before:
| nanosecs/byte mebibytes/sec cycles/byte auto Mhz GCM auth | 0.063 ns/B 15200 MiB/s 0.257 c/B 4090 GCM-SIV auth | 0.061 ns/B 15704 MiB/s 0.248 c/B 4090
After (ghash ~41% faster, polyval ~34% faster):
| nanosecs/byte mebibytes/sec cycles/byte auto Mhz GCM auth | 0.044 ns/B 21614 MiB/s 0.181 c/B 4096±3 GCM-SIV auth | 0.045 ns/B 21108 MiB/s 0.185 c/B 4097±3
AES128-GCM / AES128-GCM-SIV encryption:
| nanosecs/byte mebibytes/sec cycles/byte auto Mhz GCM enc | 0.084 ns/B 11306 MiB/s 0.346 c/B 4097±3 GCM-SIV enc | 0.086 ns/B 11026 MiB/s 0.354 c/B 4096±3
- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>