Home GnuPG

camellia: add amd64 GFNI/AVX512 implementation

Description

camellia: add amd64 GFNI/AVX512 implementation

* cipher/Makefile.am: Add 'camellia-gfni-avx512-amd64.S'.
* cipher/bulkhelp.h (bulk_ocb_prepare_L_pointers_array_blk64): New.
* cipher/camellia-aesni-avx2-amd64.h: Rename internal functions from
"__camellia_???" to "FUNC_NAME(???)"; Minor changes to comments.
* cipher/camellia-gfni-avx512-amd64.S: New.
* cipher/camellia-gfni.c (USE_GFNI_AVX512): New.
(CAMELLIA_context): Add 'use_gfni_avx512'.
(_gcry_camellia_gfni_avx512_ctr_enc, _gcry_camellia_gfni_avx512_cbc_dec)
(_gcry_camellia_gfni_avx512_cfb_dec, _gcry_camellia_gfni_avx512_ocb_enc)
(_gcry_camellia_gfni_avx512_ocb_dec)
(_gcry_camellia_gfni_avx512_enc_blk64)
(_gcry_camellia_gfni_avx512_dec_blk64, avx512_burn_stack_depth): New.
(camellia_setkey): Use GFNI/AVX512 if supported by CPU.
(camellia_encrypt_blk1_64, camellia_decrypt_blk1_64): New.
(_gcry_camellia_ctr_enc, _gcry_camellia_cbc_dec, _gcry_camellia_cfb_dec)
(_gcry_camellia_ocb_crypt) [USE_GFNI_AVX512]: Add GFNI/AVX512 code path.
(_gcry_camellia_xts_crypt): Change parallel block size from 32 to 64.
(selftest_ctr_128, selftest_cbc_128, selftest_cfb_128): Increase test
block size.
* cipher/chacha20-amd64-avx512.S: Clear k-mask registers with xor.
* cipher/poly1305-amd64-avx512.S: Likewise.
* cipher/sha512-avx512-amd64.S: Likewise.
---

Benchmark on Intel i3-1115G4 (tigerlake):

Before (GFNI/AVX2):
CAMELLIA128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz

CBC dec |     0.356 ns/B      2679 MiB/s      1.46 c/B      4089
CFB dec |     0.374 ns/B      2547 MiB/s      1.53 c/B      4089
CTR enc |     0.409 ns/B      2332 MiB/s      1.67 c/B      4089
CTR dec |     0.406 ns/B      2347 MiB/s      1.66 c/B      4089
XTS enc |     0.430 ns/B      2216 MiB/s      1.76 c/B      4090
XTS dec |     0.433 ns/B      2201 MiB/s      1.77 c/B      4090
OCB enc |     0.460 ns/B      2071 MiB/s      1.88 c/B      4089
OCB dec |     0.492 ns/B      1939 MiB/s      2.01 c/B      4089

After (GFNI/AVX512):
CAMELLIA128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz

      CBC dec |     0.207 ns/B      4600 MiB/s     0.827 c/B      3989
      CFB dec |     0.207 ns/B      4610 MiB/s     0.825 c/B      3989
      CTR enc |     0.218 ns/B      4382 MiB/s     0.868 c/B      3990
      CTR dec |     0.217 ns/B      4389 MiB/s     0.867 c/B      3990
      XTS enc |     0.330 ns/B      2886 MiB/s      1.35 c/B      4097±4
      XTS dec |     0.328 ns/B      2904 MiB/s      1.35 c/B      4097±3
      OCB enc |     0.246 ns/B      3879 MiB/s     0.981 c/B      3990
      OCB dec |     0.247 ns/B      3855 MiB/s     0.987 c/B      3990

CBC dec: 70% faster
CFB dec: 80% faster
CTR: 87% faster
XTS: 31% faster
OCB: 92% faster
  • Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>

Details

Provenance
jukiviliAuthored on May 1 2022, 3:01 PM
Parents
rCa611e3a25d61: mpi: Fix for 64-bit for _gcry_mpih_cmp_ui.
Branches
Unknown
Tags
Unknown