Add ARMv8/AArch64 implementation of chacha20
* cipher/Makefile.am: Add 'chacha20-aarch64.S'. * cipher/chacha20-aarch64.S: New. * cipher/chacha20.c (USE_AARCH64_SIMD): New. (_gcry_chacha20_aarch_blocks4): New. (chacha20_do_setkey): Add HWF selection for Aarch64 implementation. * configure.ac: Add 'chacha20-aarch64.lo'.
Benchmark on Cortex-A53 (1152 Mhz):
Before:
CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte
STREAM enc | 7.91 ns/B 120.6 MiB/s 9.11 c/B STREAM dec | 7.91 ns/B 120.6 MiB/s 9.11 c/B
After (1.66x faster):
CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte
STREAM enc | 4.74 ns/B 201.2 MiB/s 5.46 c/B STREAM dec | 4.74 ns/B 201.3 MiB/s 5.46 c/B
- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>