rijndael: add x86_64 VAES/AVX2 accelerated implementation
* cipher/Makefile.am: Add 'rijndael-vaes.c' and 'rijndael-vaes-avx2-amd64.S'. * cipher/rijndael-internal.h (USE_VAES): New. * cipher/rijndael-vaes-avx2-amd64.S: New. * cipher/rijndael-vaes.c: New. * cipher/rijndael.c (_gcry_aes_vaes_cfb_dec, _gcry_aes_vaes_cbc_dec) (_gcry_aes_vaes_ctr_enc, _gcry_aes_vaes_ocb_crypt) (_gcry_aes_vaes_xts_crypt): New. (do_setkey) [USE_VAES]: Add detection for VAES. (selftest_ctr_128, selftest_cbc_128, selftest_cfb_128) [USE_VAES]: Increase number of selftest blocks. * configure.ac: Add 'rijndael-vaes.lo' and 'rijndael-vaes-avx2-amd64.lo'.
Patch adds VAES/AVX2 accelerated implementation for CBC-decryption,
CFB-decryption, CTR-encryption, OCB-en/decryption and XTS-en/decryption.
Benchmarks on AMD Ryzen 5800X:
Before:
AES | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
CBC dec | 0.067 ns/B 14314 MiB/s 0.323 c/B 4850 CFB dec | 0.067 ns/B 14322 MiB/s 0.323 c/B 4850 CTR enc | 0.066 ns/B 14429 MiB/s 0.321 c/B 4850 CTR dec | 0.066 ns/B 14433 MiB/s 0.320 c/B 4850 XTS enc | 0.087 ns/B 10910 MiB/s 0.424 c/B 4850 XTS dec | 0.088 ns/B 10856 MiB/s 0.426 c/B 4850 OCB enc | 0.070 ns/B 13633 MiB/s 0.339 c/B 4850 OCB dec | 0.069 ns/B 13911 MiB/s 0.332 c/B 4850
After (XTS ~1.7x faster, others ~1.9x faster):
AES | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
CBC dec | 0.034 ns/B 28159 MiB/s 0.164 c/B 4850 CFB dec | 0.034 ns/B 27955 MiB/s 0.165 c/B 4850 CTR enc | 0.034 ns/B 28214 MiB/s 0.164 c/B 4850 CTR dec | 0.034 ns/B 28146 MiB/s 0.164 c/B 4850 XTS enc | 0.051 ns/B 18539 MiB/s 0.249 c/B 4850 XTS dec | 0.051 ns/B 18655 MiB/s 0.248 c/B 4850 GCM auth | 0.088 ns/B 10817 MiB/s 0.428 c/B 4850 OCB enc | 0.037 ns/B 25824 MiB/s 0.179 c/B 4850 OCB dec | 0.038 ns/B 25359 MiB/s 0.182 c/B 4850
- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>