Home GnuPG

sm4-aesni-avx2: add generic 1 to 16 block bulk processing function

Description

sm4-aesni-avx2: add generic 1 to 16 block bulk processing function

* cipher/sm4-aesni-avx2-amd64.S: Remove unnecessary vzeroupper at
function entries.
(_gcry_sm4_aesni_avx2_crypt_blk1_16): New.
* cipher/sm4.c (_gcry_sm4_aesni_avx2_crypt_blk1_16)
(sm4_aesni_avx2_crypt_blk1_16): New.
(sm4_get_crypt_blk1_16_fn) [USE_AESNI_AVX2]: Add
'sm4_aesni_avx2_crypt_blk1_16'.

Benchmark AMD Ryzen 5800X:

Before:
SM4 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz

XTS enc |      1.48 ns/B     643.2 MiB/s      7.19 c/B      4850
XTS dec |      1.48 ns/B     644.3 MiB/s      7.18 c/B      4850

After (1.37x faster):
SM4 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz

XTS enc |      1.07 ns/B     888.7 MiB/s      5.21 c/B      4850
XTS dec |      1.07 ns/B     889.4 MiB/s      5.20 c/B      4850

Benchmark on Intel i5-6200U 2.30GHz:

Before:
SM4 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz

XTS enc |      2.95 ns/B     323.0 MiB/s      8.25 c/B      2792
XTS dec |      2.95 ns/B     323.0 MiB/s      8.24 c/B      2792

After (1.64x faster):
SM4 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz

XTS enc |      1.79 ns/B     531.4 MiB/s      5.01 c/B      2791
XTS dec |      1.79 ns/B     531.6 MiB/s      5.01 c/B      2791

Reviewed-and-tested-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>

  • Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>

Details

Provenance
jukiviliAuthored on Apr 24 2022, 8:32 PM
Parents
rC5095d60af42d: Add SM4 x86-64/GFNI/AVX2 implementation
Branches
Unknown
Tags
Unknown