Add Serpent AVX2 implementation
* cipher/Makefile.am: Add 'serpent-avx2-amd64.S'. * cipher/serpent-avx2-amd64.S: New file. * cipher/serpent.c (USE_AVX2): New macro. (serpent_context_t) [USE_AVX2]: Add 'use_avx2'. [USE_AVX2] (_gcry_serpent_avx2_ctr_enc, _gcry_serpent_avx2_cbc_dec) (_gcry_serpent_avx2_cfb_dec): New prototypes. (serpent_setkey_internal) [USE_AVX2]: Check for AVX2 capable hardware and set 'use_avx2'. (_gcry_serpent_ctr_enc) [USE_AVX2]: Use AVX2 accelerated functions. (_gcry_serpent_cbc_dec) [USE_AVX2]: Use AVX2 accelerated functions. (_gcry_serpent_cfb_dec) [USE_AVX2]: Use AVX2 accelerated functions. (selftest_ctr_128, selftest_cbc_128, selftest_cfb_128): Grow 'nblocks' so that AVX2 codepaths are tested. * configure.ac (serpent) [avx2support]: Add 'serpent-avx2-amd64.lo'.
Add new AVX2 implementation of Serpent that processes 16 blocks in parallel.
Speed old (SSE2) vs. new (AVX2) on Intel Core i5-4570:
ECB/Stream CBC CFB OFB CTR --------------- --------------- --------------- --------------- ---------------
SERPENT128 1.00x 1.00x 1.00x 2.10x 1.00x 2.16x 1.01x 1.00x 2.16x 2.18x
- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>