camellia-avx2: add bulk processing for XTS mode
* cipher/bulkhelp.h (bulk_xts_crypt_128): New. * cipher/camellia-glue.c (_gcry_camellia_xts_crypt): New. (camellia_set_key) [USE_AESNI_AVX2]: Set XTS bulk function if AVX2 implementation is available.
Benchmark on AMD Ryzen 5800X:
Before:
CAMELLIA128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
XTS enc | 3.79 ns/B 251.8 MiB/s 18.37 c/B 4850 XTS dec | 3.77 ns/B 253.2 MiB/s 18.27 c/B 4850
After (6.8x faster):
CAMELLIA128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
XTS enc | 0.554 ns/B 1720 MiB/s 2.69 c/B 4850 XTS dec | 0.541 ns/B 1762 MiB/s 2.63 c/B 4850
- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>