AVX implementation of BLAKE2s
* cipher/Makefile.am: Add 'blake2s-amd64-avx.S'. * cipher/blake2.c (USE_AVX, _gry_blake2s_transform_amd64_avx): New. (BLAKE2S_CONTEXT) [USE_AVX]: Add 'use_avx'. (blake2s_transform): Rename to ... (blake2s_transform_generic): ... this. (blake2s_transform): New. (blake2s_final): Pass 'ctx' pointer to transform function instead of 'S'. (blake2s_init_ctx): Check HW features and enable AVX implementation if supported. * cipher/blake2s-amd64-avx.S: New. * configure.ac: Add 'blake2s-amd64-avx.lo'.
Benchmark on Intel Core i7-4790K (4.0 Ghz, no turbo):
Before:
| nanosecs/byte mebibytes/sec cycles/byte
BLAKE2S_256 | 1.77 ns/B 538.2 MiB/s 7.09 c/B
After (~1.3x faster):
| nanosecs/byte mebibytes/sec cycles/byte
BLAKE2S_256 | 1.34 ns/B 711.4 MiB/s 5.36 c/B
- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>