chacha20: add SSE2/AMD64 optimized implementation
* cipher/Makefile.am: Add 'chacha20-sse2-amd64.S'. * cipher/chacha20-sse2-amd64.S: New. * cipher/chacha20.c (USE_SSE2): New. [USE_SSE2] (_gcry_chacha20_amd64_sse2_blocks): New. (chacha20_do_setkey) [USE_SSE2]: Use SSE2 implementation for blocks function. * configure.ac [host=x86-64]: Add 'chacha20-sse2-amd64.lo'.
Add Andrew Moon's public domain SSE2 implementation of ChaCha20. Original
source is available at: https://github.com/floodyberry/chacha-opt
Benchmark on Intel i5-4570 (haswell),
with "--disable-hwf intel-avx2 --disable-hwf intel-ssse3":
Old:
CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte
STREAM enc | 1.97 ns/B 483.8 MiB/s 6.31 c/B STREAM dec | 1.97 ns/B 483.6 MiB/s 6.31 c/B
New:
CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte
STREAM enc | 0.931 ns/B 1024.7 MiB/s 2.98 c/B STREAM dec | 0.930 ns/B 1025.0 MiB/s 2.98 c/B
- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>