crc-intel-pclmul: add AVX2 and AVX512 code paths
* cipher/crc-intel-pclmul.c (crc32_consts_s, crc32_consts) (crc24rfc2440_consts): Add k_ymm and k_zmm. (crc32_reflected_bulk, crc32_bulk): Add VPCLMUL+AVX2 and VAES_VPCLMUL+AVX512 code paths; Add 'hwfeatures' parameter. (_gcry_crc32_intel_pclmul, _gcry_crc24rfc2440_intel_pclmul): Add 'hwfeatures' parameter. * cipher/crc.c (CRC_CONTEXT) [USE_INTEL_PCLMUL]: Add 'hwfeatures'. (_gcry_crc32_intel_pclmul, _gcry_crc24rfc2440_intel_pclmul): Add 'hwfeatures' parameter. (crc32_init, crc32rfc1510_init, crc24rfc2440_init) [USE_INTEL_PCLMUL]: Store HW features to context.
Benchmark on Zen4:
Before:
| nanosecs/byte mebibytes/sec cycles/byte auto Mhz
CRC32 | 0.046 ns/B 20861 MiB/s 0.248 c/B 5421±1
CRC32RFC1510 | 0.046 ns/B 20809 MiB/s 0.250 c/B 5463±14
CRC24RFC2440 | 0.046 ns/B 20934 MiB/s 0.251 c/B 5504±2
After AVX2:
| nanosecs/byte mebibytes/sec cycles/byte auto Mhz
CRC32 | 0.023 ns/B 42277 MiB/s 0.123 c/B 5440±6
CRC32RFC1510 | 0.022 ns/B 42949 MiB/s 0.121 c/B 5454±16
CRC24RFC2440 | 0.023 ns/B 41955 MiB/s 0.124 c/B 5439±13
After AVX512:
| nanosecs/byte mebibytes/sec cycles/byte auto Mhz
CRC32 | 0.011 ns/B 85877 MiB/s 0.061 c/B 5500
CRC32RFC1510 | 0.011 ns/B 83898 MiB/s 0.063 c/B 5500
CRC24RFC2440 | 0.012 ns/B 80590 MiB/s 0.065 c/B 5500
- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>