chacha20: add AVX2/AMD64 assembly implementation
* cipher/Makefile.am: Add 'chacha20-avx2-amd64.S'. * cipher/chacha20-avx2-amd64.S: New. * cipher/chacha20.c (USE_AVX2): New macro. [USE_AVX2] (_gcry_chacha20_amd64_avx2_blocks): New. (chacha20_do_setkey): Select AVX2 implementation if there is HW support. (selftest): Increase size of buf by 256. * configure.ac [host=x86-64]: Add 'chacha20-avx2-amd64.lo'.
Add AVX2 optimized implementation for ChaCha20. Based on implementation by
Andrew Moon.
SSSE3 (Intel Haswell):
CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte
STREAM enc | 0.742 ns/B 1284.8 MiB/s 2.38 c/B STREAM dec | 0.741 ns/B 1286.5 MiB/s 2.37 c/B
AVX2:
CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte
STREAM enc | 0.393 ns/B 2428.0 MiB/s 1.26 c/B STREAM dec | 0.392 ns/B 2433.6 MiB/s 1.25 c/B
- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>