Home GnuPG

Tune SHA-512/AVX2 and SHA-256/AVX2 implementations

Description

Tune SHA-512/AVX2 and SHA-256/AVX2 implementations

* cipher/sha256-avx2-bmi2-amd64.S (ONE_ROUND_PART1, ONE_ROUND_PART2)
(ONE_ROUND): New round function.
(FOUR_ROUNDS_AND_SCHED, FOUR_ROUNDS): Use new round function.
(_gcry_sha256_transform_amd64_avx2): Exit early if number of blocks is
zero; Writing XFER to stack earlier and handle XREF writing in
FOUR_ROUNDS_AND_SCHED.
* cipher/sha512-avx2-bmi2-amd64.S (MASK_YMM_LO, MASK_YMM_LOx): New.
(ONE_ROUND_PART1, ONE_ROUND_PART2, ONE_ROUND): New round function.
(FOUR_ROUNDS_AND_SCHED, FOUR_ROUNDS): Use new round function.
(_gcry_sha512_transform_amd64_avx2): Writing XFER to stack earlier and
handle XREF writing in FOUR_ROUNDS_AND_SCHED.

Benchmark on Intel Haswell (4.0Ghz):

Before:

|  nanosecs/byte   mebibytes/sec   cycles/byte

SHA256 | 2.17 ns/B 439.0 MiB/s 8.68 c/B
SHA512 | 1.56 ns/B 612.5 MiB/s 6.23 c/B

After (~4-6% faster):

|  nanosecs/byte   mebibytes/sec   cycles/byte

SHA256 | 2.05 ns/B 465.9 MiB/s 8.18 c/B
SHA512 | 1.49 ns/B 640.3 MiB/s 5.95 c/B

  • Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>

Details

Provenance
jukiviliAuthored on Apr 7 2019, 4:53 PM
Parents
rCa3683b6f6231: Add SHA512/224 and SHA512/256 algorithms
Branches
Unknown
Tags
Unknown