poly1305: add AVX512 implementation
* LICENSES: Add 3-clause BSD license for poly1305-amd64-avx512.S. * cipher/Makefile.am: Add 'poly1305-amd64-avx512.S'. * cipher/poly1305-amd64-avx512.S: New. * cipher/poly1305-internal.h (POLY1305_USE_AVX512): New. (poly1305_context_s): Add 'use_avx512'. * cipher/poly1305.c (ASM_FUNC_ABI, ASM_FUNC_WRAPPER_ATTR): New. [POLY1305_USE_AVX512] (_gcry_poly1305_amd64_avx512_blocks) (poly1305_amd64_avx512_blocks): New. (poly1305_init): Use AVX512 is HW feature available (set use_avx512). [USE_MPI_64BIT] (poly1305_blocks): Rename to ... [USE_MPI_64BIT] (poly1305_blocks_generic): ... this. [USE_MPI_64BIT] (poly1305_blocks): New.
Patch adds AMD64 AVX512-FMA52 implementation for Poly1305.
Benchmark on Intel Core i3-1115G4 (tigerlake):
Before:
| nanosecs/byte mebibytes/sec cycles/byte auto Mhz
POLY1305 | 0.306 ns/B 3117 MiB/s 1.25 c/B 4090
After (5.0x faster):
| nanosecs/byte mebibytes/sec cycles/byte auto Mhz
POLY1305 | 0.061 ns/B 15699 MiB/s 0.249 c/B 4095±3
- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>