Home GnuPG

camellia: add x86_64 VAES/AVX2 accelerated implementation

Description

camellia: add x86_64 VAES/AVX2 accelerated implementation

* cipher/Makefile.am: Add 'camellia-aesni-avx2-amd64.h' and
'camellia-vaes-avx2-amd64.S'.
* cipher/camellia-aesni-avx2-amd64.S: New, old content moved to...
* cipher/camellia-aesni-avx2-amd64.h: ...here.
(IF_AESNI, IF_VAES, FUNC_NAME): New.
* cipher/camellia-vaes-avx2-amd64.S: New.
* cipher/camellia-glue.c (USE_VAES_AVX2): New.
(CAMELLIA_context): New member 'use_vaes_avx2'.
(_gcry_camellia_vaes_avx2_ctr_enc, _gcry_camellia_vaes_avx2_cbc_dec)
(_gcry_camellia_vaes_avx2_cfb_dec, _gcry_camellia_vaes_avx2_ocb_enc)
(_gcry_camellia_vaes_avx2_ocb_dec)
(_gcry_camellia_vaes_avx2_ocb_auth): New.
(camellia_setkey): Check for HWF_INTEL_VAES.
(_gcry_camellia_ctr_enc, _gcry_camellia_cbc_dec)
(_gcry_camellia_cfb_dec, _gcry_camellia_ocb_crypt)
(_gcry_camellia_ocb_auth): Add USE_VAES_AVX2 code.
* configure.ac: Add 'camellia-vaes-avx2-amd64.lo'.

Camellia AES-NI/AVX2 implementation had to split 256-bit vector
to 128-bit parts for AES processing, but now we can use those
256-bit registers directly with VAES.

Benchmarks on AMD Ryzen 5800X:

Before (AES-NI/AVX2):
CAMELLIA128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz

 CBC dec |     0.539 ns/B      1769 MiB/s      2.62 c/B      4852
 CFB dec |     0.528 ns/B      1806 MiB/s      2.56 c/B      4852±1
 CTR enc |     0.552 ns/B      1728 MiB/s      2.68 c/B      4850
 OCB enc |     0.550 ns/B      1734 MiB/s      2.65 c/B      4825
 OCB dec |     0.577 ns/B      1653 MiB/s      2.78 c/B      4825
OCB auth |     0.546 ns/B      1747 MiB/s      2.63 c/B      4825

After (VAES/AVX2, CBC-dec ~13%, CFB-dec/CTR/OCB ~20% faster):
CAMELLIA128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz

 CBC dec |     0.477 ns/B      1999 MiB/s      2.31 c/B      4850
 CFB dec |     0.433 ns/B      2201 MiB/s      2.10 c/B      4850
 CTR enc |     0.438 ns/B      2176 MiB/s      2.13 c/B      4851
 OCB enc |     0.449 ns/B      2122 MiB/s      2.18 c/B      4850
 OCB dec |     0.468 ns/B      2038 MiB/s      2.27 c/B      4850
OCB auth |     0.447 ns/B      2131 MiB/s      2.17 c/B      4850
  • Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>

Details

Provenance
jukiviliAuthored on Jan 10 2021, 11:56 PM
Parents
rCeb404d890453: hwf-x86: add "intel-vaes-vpclmul" HW feature
Branches
Unknown
Tags
Unknown