serpent: add x86/AVX512 implementation
* cipher/Makefile.am: Add `serpent-avx512-x86.c`; Add extra CFLAG handling for `serpent-avx512-x86.o` and `serpent-avx512-x86.lo`. * cipher/serpent-avx512-x86.c: New. * cipher/serpent.c (USE_AVX512): New. (serpent_context_t): Add `use_avx512`. [USE_AVX512] (_gcry_serpent_avx512_cbc_dec) (_gcry_serpent_avx512_cfb_dec, _gcry_serpent_avx512_ctr_enc) (_gcry_serpent_avx512_ocb_crypt, _gcry_serpent_avx512_blk32): New. (serpent_setkey_internal) [USE_AVX512]: Set `use_avx512` is AVX512 HW available. (_gcry_serpent_ctr_enc) [USE_AVX512]: New. (_gcry_serpent_cbc_dec) [USE_AVX512]: New. (_gcry_serpent_cfb_dec) [USE_AVX512]: New. (_gcry_serpent_ocb_crypt) [USE_AVX512]: New. (serpent_crypt_blk1_16): Rename to... (serpent_crypt_blk1_32): ... this; Add AVX512 code-path; Adjust for increase from max 16 blocks to max 32 blocks. (serpent_encrypt_blk1_16): Rename to ... (serpent_encrypt_blk1_32): ... this. (serpent_decrypt_blk1_16): Rename to ... (serpent_decrypt_blk1_32): ... this. (_gcry_serpent_xts_crypt, _gcry_serpent_ecb_crypt): Increase bulk block count from 16 to 32. * configure.ac (gcry_cv_cc_x86_avx512_intrinsics) (ENABLE_X86_AVX512_INTRINSICS_EXTRA_CFLAGS): New. (GCRYPT_ASM_CIPHERS): Add `serpent-avx512-x86.lo`.
Benchmark on AMD Ryzen 9 7900X:
Before:
Cipher:
SERPENT128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
ECB enc | 1.52 ns/B 626.2 MiB/s 8.26 c/B 5425 ECB dec | 1.48 ns/B 645.5 MiB/s 8.01 c/B 5425 CBC enc | 5.81 ns/B 164.2 MiB/s 31.94 c/B 5500 CBC dec | 0.722 ns/B 1322 MiB/s 3.91 c/B 5425 CFB enc | 5.88 ns/B 162.3 MiB/s 32.31 c/B 5500 CFB dec | 0.735 ns/B 1297 MiB/s 3.99 c/B 5424 OFB enc | 5.77 ns/B 165.3 MiB/s 31.72 c/B 5500 OFB dec | 5.77 ns/B 165.4 MiB/s 31.72 c/B 5500 CTR enc | 0.756 ns/B 1262 MiB/s 4.10 c/B 5425 CTR dec | 0.776 ns/B 1228 MiB/s 4.21 c/B 5424 XTS enc | 1.68 ns/B 568.3 MiB/s 9.10 c/B 5424 XTS dec | 1.58 ns/B 604.2 MiB/s 8.56 c/B 5425 CCM enc | 6.60 ns/B 144.5 MiB/s 36.30 c/B 5500 CCM dec | 6.60 ns/B 144.5 MiB/s 36.30 c/B 5500 CCM auth | 5.86 ns/B 162.6 MiB/s 32.25 c/B 5500 EAX enc | 6.54 ns/B 145.8 MiB/s 35.98 c/B 5500 EAX dec | 6.54 ns/B 145.8 MiB/s 35.98 c/B 5500 EAX auth | 5.81 ns/B 164.2 MiB/s 31.94 c/B 5500 GCM enc | 0.787 ns/B 1212 MiB/s 4.27 c/B 5425 GCM dec | 0.788 ns/B 1211 MiB/s 4.27 c/B 5425 GCM auth | 0.038 ns/B 24932 MiB/s 0.210 c/B 5500 OCB enc | 0.750 ns/B 1272 MiB/s 4.07 c/B 5424 OCB dec | 0.743 ns/B 1284 MiB/s 4.03 c/B 5425 OCB auth | 0.749 ns/B 1274 MiB/s 4.06 c/B 5425 SIV enc | 6.54 ns/B 145.8 MiB/s 35.99 c/B 5500 SIV dec | 6.55 ns/B 145.7 MiB/s 36.01 c/B 5500 SIV auth | 5.81 ns/B 164.2 MiB/s 31.94 c/B 5500 GCM-SIV enc | 5.63 ns/B 169.4 MiB/s 30.97 c/B 5500 GCM-SIV dec | 5.64 ns/B 169.2 MiB/s 31.00 c/B 5500 GCM-SIV auth | 0.038 ns/B 25201 MiB/s 0.208 c/B 5500
After:
SERPENT128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
ECB enc | 0.578 ns/B 1649 MiB/s 3.14 c/B 5425 ECB dec | 0.505 ns/B 1889 MiB/s 2.74 c/B 5424 CBC enc | 5.81 ns/B 164.1 MiB/s 31.96 c/B 5500 CBC dec | 0.527 ns/B 1810 MiB/s 2.86 c/B 5424 CFB enc | 5.88 ns/B 162.3 MiB/s 32.31 c/B 5500 CFB dec | 0.471 ns/B 2026 MiB/s 2.55 c/B 5425 OFB enc | 5.77 ns/B 165.3 MiB/s 31.72 c/B 5500 OFB dec | 5.77 ns/B 165.3 MiB/s 31.73 c/B 5501 CTR enc | 0.464 ns/B 2053 MiB/s 2.52 c/B 5425 CTR dec | 0.464 ns/B 2057 MiB/s 2.51 c/B 5425 XTS enc | 0.551 ns/B 1732 MiB/s 2.99 c/B 5424 XTS dec | 0.527 ns/B 1809 MiB/s 2.86 c/B 5424 CCM enc | 6.32 ns/B 150.8 MiB/s 34.78 c/B 5501 CCM dec | 6.32 ns/B 150.9 MiB/s 34.77 c/B 5500 CCM auth | 5.86 ns/B 162.6 MiB/s 32.25 c/B 5500 EAX enc | 6.26 ns/B 152.2 MiB/s 34.46 c/B 5500 EAX dec | 6.27 ns/B 152.2 MiB/s 34.46 c/B 5500 EAX auth | 5.81 ns/B 164.2 MiB/s 31.94 c/B 5500 GCM enc | 0.497 ns/B 1917 MiB/s 2.70 c/B 5425 GCM dec | 0.499 ns/B 1913 MiB/s 2.70 c/B 5425 GCM auth | 0.031 ns/B 30709 MiB/s 0.171 c/B 5500 OCB enc | 0.482 ns/B 1979 MiB/s 2.61 c/B 5424 OCB dec | 0.475 ns/B 2007 MiB/s 2.58 c/B 5424 OCB auth | 0.748 ns/B 1274 MiB/s 4.06 c/B 5424 SIV enc | 6.27 ns/B 152.0 MiB/s 34.50 c/B 5500 SIV dec | 6.27 ns/B 152.1 MiB/s 34.48 c/B 5500 SIV auth | 5.81 ns/B 164.2 MiB/s 31.94 c/B 5500 GCM-SIV enc | 5.63 ns/B 169.5 MiB/s 30.95 c/B 5500 GCM-SIV dec | 5.63 ns/B 169.3 MiB/s 30.98 c/B 5500 GCM-SIV auth | 0.034 ns/B 28060 MiB/s 0.187 c/B 5500
- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>