rijndael-vaes-avx2: perform checksumming inline
* cipher/rijndael-vaes-avx2-amd64.S (_gcry_vaes_avx2_ocb_checksum): Remove. (_gcry_vaes_avx2_ocb_crypt_amd64): Add inline checksumming.
VAES/AVX2/OCB encryption implementation had same issue with
performance drop with large buffers as did AES-NI/OCB implementation,
see e924ce456d5728a81c148de4a6eb23373cb70ca0 for details. Patch
changes VAES/AVX2/OCB to perform checksumming inline with encryption
and decryption instead of using 2-pass approach. Inline checksumming
also gives nice small ~6% speed boost too.
Benchmark on Intel Core i3-1115G4:
Before:
AES | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
OCB enc | 0.044 ns/B 21569 MiB/s 0.181 c/B 4089 OCB dec | 0.045 ns/B 21298 MiB/s 0.183 c/B 4089
After:
AES | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
OCB enc | 0.042 ns/B 22922 MiB/s 0.170 c/B 4089 OCB dec | 0.042 ns/B 22676 MiB/s 0.172 c/B 4089
- GnuPG-bug-id: T5875
- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>