Home GnuPG

serpent: add x86/AVX512 implementation

Description

serpent: add x86/AVX512 implementation

* cipher/Makefile.am: Add `serpent-avx512-x86.c`; Add extra CFLAG
handling for `serpent-avx512-x86.o` and `serpent-avx512-x86.lo`.
* cipher/serpent-avx512-x86.c: New.
* cipher/serpent.c (USE_AVX512): New.
(serpent_context_t): Add `use_avx512`.
[USE_AVX512] (_gcry_serpent_avx512_cbc_dec)
(_gcry_serpent_avx512_cfb_dec, _gcry_serpent_avx512_ctr_enc)
(_gcry_serpent_avx512_ocb_crypt, _gcry_serpent_avx512_blk32): New.
(serpent_setkey_internal) [USE_AVX512]: Set `use_avx512` is
AVX512 HW available.
(_gcry_serpent_ctr_enc) [USE_AVX512]: New.
(_gcry_serpent_cbc_dec) [USE_AVX512]: New.
(_gcry_serpent_cfb_dec) [USE_AVX512]: New.
(_gcry_serpent_ocb_crypt) [USE_AVX512]: New.
(serpent_crypt_blk1_16): Rename to...
(serpent_crypt_blk1_32): ... this; Add AVX512 code-path; Adjust for
increase from max 16 blocks to max 32 blocks.
(serpent_encrypt_blk1_16): Rename to ...
(serpent_encrypt_blk1_32): ... this.
(serpent_decrypt_blk1_16): Rename to ...
(serpent_decrypt_blk1_32): ... this.
(_gcry_serpent_xts_crypt, _gcry_serpent_ecb_crypt): Increase bulk
block count from 16 to 32.
* configure.ac (gcry_cv_cc_x86_avx512_intrinsics)
(ENABLE_X86_AVX512_INTRINSICS_EXTRA_CFLAGS): New.
(GCRYPT_ASM_CIPHERS): Add `serpent-avx512-x86.lo`.

Benchmark on AMD Ryzen 9 7900X:

Before:
Cipher:
SERPENT128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz

     ECB enc |      1.52 ns/B     626.2 MiB/s      8.26 c/B      5425
     ECB dec |      1.48 ns/B     645.5 MiB/s      8.01 c/B      5425
     CBC enc |      5.81 ns/B     164.2 MiB/s     31.94 c/B      5500
     CBC dec |     0.722 ns/B      1322 MiB/s      3.91 c/B      5425
     CFB enc |      5.88 ns/B     162.3 MiB/s     32.31 c/B      5500
     CFB dec |     0.735 ns/B      1297 MiB/s      3.99 c/B      5424
     OFB enc |      5.77 ns/B     165.3 MiB/s     31.72 c/B      5500
     OFB dec |      5.77 ns/B     165.4 MiB/s     31.72 c/B      5500
     CTR enc |     0.756 ns/B      1262 MiB/s      4.10 c/B      5425
     CTR dec |     0.776 ns/B      1228 MiB/s      4.21 c/B      5424
     XTS enc |      1.68 ns/B     568.3 MiB/s      9.10 c/B      5424
     XTS dec |      1.58 ns/B     604.2 MiB/s      8.56 c/B      5425
     CCM enc |      6.60 ns/B     144.5 MiB/s     36.30 c/B      5500
     CCM dec |      6.60 ns/B     144.5 MiB/s     36.30 c/B      5500
    CCM auth |      5.86 ns/B     162.6 MiB/s     32.25 c/B      5500
     EAX enc |      6.54 ns/B     145.8 MiB/s     35.98 c/B      5500
     EAX dec |      6.54 ns/B     145.8 MiB/s     35.98 c/B      5500
    EAX auth |      5.81 ns/B     164.2 MiB/s     31.94 c/B      5500
     GCM enc |     0.787 ns/B      1212 MiB/s      4.27 c/B      5425
     GCM dec |     0.788 ns/B      1211 MiB/s      4.27 c/B      5425
    GCM auth |     0.038 ns/B     24932 MiB/s     0.210 c/B      5500
     OCB enc |     0.750 ns/B      1272 MiB/s      4.07 c/B      5424
     OCB dec |     0.743 ns/B      1284 MiB/s      4.03 c/B      5425
    OCB auth |     0.749 ns/B      1274 MiB/s      4.06 c/B      5425
     SIV enc |      6.54 ns/B     145.8 MiB/s     35.99 c/B      5500
     SIV dec |      6.55 ns/B     145.7 MiB/s     36.01 c/B      5500
    SIV auth |      5.81 ns/B     164.2 MiB/s     31.94 c/B      5500
 GCM-SIV enc |      5.63 ns/B     169.4 MiB/s     30.97 c/B      5500
 GCM-SIV dec |      5.64 ns/B     169.2 MiB/s     31.00 c/B      5500
GCM-SIV auth |     0.038 ns/B     25201 MiB/s     0.208 c/B      5500

After:
SERPENT128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz

     ECB enc |     0.578 ns/B      1649 MiB/s      3.14 c/B      5425
     ECB dec |     0.505 ns/B      1889 MiB/s      2.74 c/B      5424
     CBC enc |      5.81 ns/B     164.1 MiB/s     31.96 c/B      5500
     CBC dec |     0.527 ns/B      1810 MiB/s      2.86 c/B      5424
     CFB enc |      5.88 ns/B     162.3 MiB/s     32.31 c/B      5500
     CFB dec |     0.471 ns/B      2026 MiB/s      2.55 c/B      5425
     OFB enc |      5.77 ns/B     165.3 MiB/s     31.72 c/B      5500
     OFB dec |      5.77 ns/B     165.3 MiB/s     31.73 c/B      5501
     CTR enc |     0.464 ns/B      2053 MiB/s      2.52 c/B      5425
     CTR dec |     0.464 ns/B      2057 MiB/s      2.51 c/B      5425
     XTS enc |     0.551 ns/B      1732 MiB/s      2.99 c/B      5424
     XTS dec |     0.527 ns/B      1809 MiB/s      2.86 c/B      5424
     CCM enc |      6.32 ns/B     150.8 MiB/s     34.78 c/B      5501
     CCM dec |      6.32 ns/B     150.9 MiB/s     34.77 c/B      5500
    CCM auth |      5.86 ns/B     162.6 MiB/s     32.25 c/B      5500
     EAX enc |      6.26 ns/B     152.2 MiB/s     34.46 c/B      5500
     EAX dec |      6.27 ns/B     152.2 MiB/s     34.46 c/B      5500
    EAX auth |      5.81 ns/B     164.2 MiB/s     31.94 c/B      5500
     GCM enc |     0.497 ns/B      1917 MiB/s      2.70 c/B      5425
     GCM dec |     0.499 ns/B      1913 MiB/s      2.70 c/B      5425
    GCM auth |     0.031 ns/B     30709 MiB/s     0.171 c/B      5500
     OCB enc |     0.482 ns/B      1979 MiB/s      2.61 c/B      5424
     OCB dec |     0.475 ns/B      2007 MiB/s      2.58 c/B      5424
    OCB auth |     0.748 ns/B      1274 MiB/s      4.06 c/B      5424
     SIV enc |      6.27 ns/B     152.0 MiB/s     34.50 c/B      5500
     SIV dec |      6.27 ns/B     152.1 MiB/s     34.48 c/B      5500
    SIV auth |      5.81 ns/B     164.2 MiB/s     31.94 c/B      5500
 GCM-SIV enc |      5.63 ns/B     169.5 MiB/s     30.95 c/B      5500
 GCM-SIV dec |      5.63 ns/B     169.3 MiB/s     30.98 c/B      5500
GCM-SIV auth |     0.034 ns/B     28060 MiB/s     0.187 c/B      5500
  • Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>

Details

Provenance
jukiviliAuthored on Apr 30 2023, 2:46 PM
Parents
rC01c0185e6360: build: Sync libtool from libgpg-error for 64-bit Windows.
Branches
Unknown
Tags
Unknown