AVX2 implementation of BLAKE2b
* cipher/Makefile.am: Add 'blake2b-amd64-avx2.S'. * cipher/blake2.c (USE_AVX2, ASM_FUNC_ABI, ASM_EXTRA_STACK) (_gry_blake2b_transform_amd64_avx2): New. (BLAKE2B_CONTEXT) [USE_AVX2]: Add 'use_avx2'. (blake2b_transform): Rename to ... (blake2b_transform_generic): ... this. (blake2b_transform): New. (blake2b_final): Pass 'ctx' pointer to transform function instead of 'S'. (blake2b_init_ctx): Check HW features and enable AVX2 implementation if supported. * cipher/blake2b-amd64-avx2.S: New. * configure.ac: Add 'blake2b-amd64-avx2.lo'.
Benchmark on Intel Core i7-4790K (4.0 Ghz, no turbo):
Before:
| nanosecs/byte mebibytes/sec cycles/byte
BLAKE2B_512 | 1.07 ns/B 887.8 MiB/s 4.30 c/B
After (~1.4x faster):
| nanosecs/byte mebibytes/sec cycles/byte
BLAKE2B_512 | 0.771 ns/B 1236.8 MiB/s 3.08 c/B
- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>