ghash|polyval: add x86_64 VPCLMUL/AVX2 accelerated implementation
* cipher/cipher-gcm-intel-pclmul.c (GCM_INTEL_USE_VPCLMUL_AVX2) (GCM_INTEL_AGGR8_TABLE_INITIALIZED) (GCM_INTEL_AGGR16_TABLE_INITIALIZED): New. (gfmul_pclmul): Fixes to comments. [GCM_USE_INTEL_VPCLMUL_AVX2] (GFMUL_AGGR16_ASM_VPCMUL_AVX2) (gfmul_vpclmul_avx2_aggr16, gfmul_vpclmul_avx2_aggr16_le) (gfmul_pclmul_avx2, gcm_lsh_avx2, load_h1h2_to_ymm1) (ghash_setup_aggr8_avx2, ghash_setup_aggr16_avx2): New. (_gcry_ghash_setup_intel_pclmul): Add 'hw_features' parameter; Setup ghash and polyval function pointers for context; Add VPCLMUL/AVX2 code path; Defer aggr8 and aggr16 table initialization to until first use in '_gcry_ghash_intel_pclmul' or '_gcry_polyval_intel_pclmul'. [__x86_64__] (ghash_setup_aggr8): New. (_gcry_ghash_intel_pclmul): Add VPCLMUL/AVX2 code path; Add call for aggr8 table initialization. (_gcry_polyval_intel_pclmul): Add VPCLMUL/AVX2 code path; Add call for aggr8 table initialization. * cipher/cipher-gcm.c [GCM_USE_INTEL_PCLMUL] (_gcry_ghash_intel_pclmul) (_gcry_polyval_intel_pclmul): Remove. [GCM_USE_INTEL_PCLMUL] (_gcry_ghash_setup_intel_pclmul): Add 'hw_features' parameter. (setupM) [GCM_USE_INTEL_PCLMUL]: Pass HW features to '_gcry_ghash_setup_intel_pclmul'; Let '_gcry_ghash_setup_intel_pclmul' setup function pointers. * cipher/cipher-internal.h (GCM_USE_INTEL_VPCLMUL_AVX2): New. (gcry_cipher_handle): Add member 'gcm.hw_impl_flags'.
Patch adds VPCLMUL/AVX2 accelerated implementation for GHASH (GCM) and
POLYVAL (GCM-SIV).
Benchmark on AMD Ryzen 5800X (zen3):
Before:
| nanosecs/byte mebibytes/sec cycles/byte auto Mhz GCM auth | 0.088 ns/B 10825 MiB/s 0.427 c/B 4850 GCM-SIV auth | 0.083 ns/B 11472 MiB/s 0.403 c/B 4850
After: (~1.93x faster)
| nanosecs/byte mebibytes/sec cycles/byte auto Mhz GCM auth | 0.045 ns/B 21098 MiB/s 0.219 c/B 4850 GCM-SIV auth | 0.043 ns/B 22181 MiB/s 0.209 c/B 4850
AES128-GCM / AES128-GCM-SIV encryption:
| nanosecs/byte mebibytes/sec cycles/byte auto Mhz GCM enc | 0.079 ns/B 12073 MiB/s 0.383 c/B 4850 GCM-SIV enc | 0.076 ns/B 12500 MiB/s 0.370 c/B 4850
Benchmark on Intel Core i3-1115G4 (tigerlake):
Before:
| nanosecs/byte mebibytes/sec cycles/byte auto Mhz GCM auth | 0.080 ns/B 11919 MiB/s 0.327 c/B 4090 GCM-SIV auth | 0.075 ns/B 12643 MiB/s 0.309 c/B 4090
After: (~1.28x faster)
| nanosecs/byte mebibytes/sec cycles/byte auto Mhz GCM auth | 0.062 ns/B 15348 MiB/s 0.254 c/B 4090 GCM-SIV auth | 0.058 ns/B 16381 MiB/s 0.238 c/B 4090
AES128-GCM / AES128-GCM-SIV encryption:
| nanosecs/byte mebibytes/sec cycles/byte auto Mhz GCM enc | 0.101 ns/B 9441 MiB/s 0.413 c/B 4090 GCM-SIV enc | 0.098 ns/B 9692 MiB/s 0.402 c/B 4089
- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>