Home GnuPG

chacha20: add AVX512 implementation

Description

chacha20: add AVX512 implementation

* cipher/Makefile.am: Add 'chacha20-amd64-avx512.S'.
* cipher/chacha20-amd64-avx512.S: New.
* cipher/chacha20.c (USE_AVX512): New.
(CHACHA20_context_s): Add 'use_avx512'.
[USE_AVX512] (_gcry_chacha20_amd64_avx512_blocks16): New.
(chacha20_do_setkey) [USE_AVX512]: Setup 'use_avx512' based on
HW features.
(do_chacha20_encrypt_stream_tail) [USE_AVX512]: Use AVX512
implementation if supported.
(_gcry_chacha20_poly1305_encrypt) [USE_AVX512]: Disable stitched
chacha20-poly1305 implementations if AVX512 implementation is used.
(_gcry_chacha20_poly1305_decrypt) [USE_AVX512]: Disable stitched
chacha20-poly1305 implementations if AVX512 implementation is used.

Benchmark on Intel Core i3-1115G4 (tigerlake):

Before:

              |  nanosecs/byte   mebibytes/sec   cycles/byte  auto Mhz
   STREAM enc |     0.276 ns/B      3451 MiB/s      1.13 c/B      4090
   STREAM dec |     0.284 ns/B      3359 MiB/s      1.16 c/B      4090
 POLY1305 enc |     0.411 ns/B      2320 MiB/s      1.68 c/B      4098±3
 POLY1305 dec |     0.408 ns/B      2338 MiB/s      1.67 c/B      4091±1
POLY1305 auth |     0.060 ns/B     15785 MiB/s     0.247 c/B      4090±1

After (stream 1.7x faster, poly1305-aead 1.8x faster):

              |  nanosecs/byte   mebibytes/sec   cycles/byte  auto Mhz
   STREAM enc |     0.162 ns/B      5869 MiB/s     0.665 c/B      4092±1
   STREAM dec |     0.162 ns/B      5884 MiB/s     0.664 c/B      4096±3
 POLY1305 enc |     0.221 ns/B      4306 MiB/s     0.907 c/B      4097±3
 POLY1305 dec |     0.220 ns/B      4342 MiB/s     0.900 c/B      4096±3
POLY1305 auth |     0.060 ns/B     15797 MiB/s     0.247 c/B      4085±2
  • Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>

Details

Provenance
jukiviliAuthored on Mar 26 2022, 6:48 PM
Parents
rCcd3ed4977076: poly1305: add AVX512 implementation
Branches
Unknown
Tags
Unknown