Home GnuPG

Add AVX2/BMI2 implementation of SHA1

Description

Add AVX2/BMI2 implementation of SHA1

* cipher/Makefile.am: Add 'sha1-avx2-bmi2-amd64.S'.
* cipher/hash-common.h (MD_BLOCK_CTX_BUFFER_SIZE): New.
(gcry_md_block_ctx): Change buffer length to MD_BLOCK_CTX_BUFFER_SIZE.
* cipher/sha1-avx-amd64.S: Add missing .size for transform function.
* cipher/sha1-ssse3-amd64.S: Add missing .size for transform function.
* cipher/sha1-avx-bmi2-amd64.S: Add missing .size for transform
function; Tweak implementation for small ~1% speed increase.
* cipher/sha1-avx2-bmi2-amd64.S: New.
* cipher/sha1.c (USE_AVX2, _gcry_sha1_transform_amd64_avx2_bmi2)
(do_sha1_transform_amd64_avx2_bmi2): New.
(sha1_init) [USE_AVX2]: Enable AVX2 implementation if supported by
HW features.
(sha1_final): Merge processing of two last blocks when extra block is
needed.

Benchmarks on Intel Haswell (4.0 Ghz):

Before (AVX/BMI2):

|  nanosecs/byte   mebibytes/sec   cycles/byte

SHA1 | 0.970 ns/B 983.2 MiB/s 3.88 c/B

After (AVX/BMI2, ~1% faster):

|  nanosecs/byte   mebibytes/sec   cycles/byte

SHA1 | 0.960 ns/B 993.1 MiB/s 3.84 c/B

After (AVX2/BMI2, ~9% faster):

|  nanosecs/byte   mebibytes/sec   cycles/byte

SHA1 | 0.890 ns/B 1071 MiB/s 3.56 c/B

  • Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>

Details

Provenance
jukiviliAuthored on Apr 5 2019, 4:39 PM
Parents
rCced7508c857c: blowfish: add three rounds parallel handling to generic C implementation
Branches
Unknown
Tags
Unknown