Add SM4 x86-64/AES-NI/AVX2 implementation
* cipher/Makefile.am: Add 'sm4-aesni-avx2-amd64.S'. * cipher/sm4-aesni-avx2-amd64.S: New. * cipher/sm4.c (USE_AESNI_AVX2): New. (SM4_context) [USE_AESNI_AVX2]: Add 'use_aesni_avx2'. [USE_AESNI_AVX2] (_gcry_sm4_aesni_avx2_ctr_enc) (_gcry_sm4_aesni_avx2_cbc_dec, _gcry_sm4_aesni_avx2_cfb_dec) (_gcry_sm4_aesni_avx2_ocb_enc, _gcry_sm4_aesni_avx2_ocb_dec) (_gcry_sm4_aesni_avx_ocb_auth): New. (sm4_setkey): Enable AES-NI/AVX2 if supported by HW. (_gcry_sm4_ctr_enc, _gcry_sm4_cbc_dec, _gcry_sm4_cfb_dec) (_gcry_sm4_ocb_crypt, _gcry_sm4_ocb_auth) [USE_AESNI_AVX2]: Add AES-NI/AVX2 bulk functions. * configure.ac: Add ''sm4-aesni-avx2-amd64.lo'.
This patch adds x86-64/AES-NI/AVX2 bulk encryption/decryption. Bulk
functions process 16 blocks in parallel.
Benchmark on AMD Ryzen 7 3700X:
Before:
SM4 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
CBC enc | 8.98 ns/B 106.2 MiB/s 38.62 c/B 4300 CBC dec | 1.55 ns/B 613.7 MiB/s 6.64 c/B 4275 CFB enc | 8.96 ns/B 106.4 MiB/s 38.52 c/B 4300 CFB dec | 1.54 ns/B 617.4 MiB/s 6.60 c/B 4275 CTR enc | 1.57 ns/B 607.8 MiB/s 6.75 c/B 4300 CTR dec | 1.57 ns/B 608.9 MiB/s 6.74 c/B 4300 OCB enc | 1.58 ns/B 603.8 MiB/s 6.75 c/B 4275 OCB dec | 1.57 ns/B 605.7 MiB/s 6.73 c/B 4275 OCB auth | 1.53 ns/B 624.5 MiB/s 6.57 c/B 4300
After (~56% faster than AES-NI/AVX impl.):
SM4 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
CBC enc | 8.93 ns/B 106.8 MiB/s 38.61 c/B 4326 CBC dec | 0.984 ns/B 969.5 MiB/s 4.23 c/B 4300 CFB enc | 8.93 ns/B 106.8 MiB/s 38.62 c/B 4325 CFB dec | 0.983 ns/B 970.3 MiB/s 4.23 c/B 4300 CTR enc | 0.998 ns/B 955.1 MiB/s 4.29 c/B 4300 CTR dec | 0.996 ns/B 957.4 MiB/s 4.28 c/B 4300 OCB enc | 1.00 ns/B 951.8 MiB/s 4.31 c/B 4300 OCB dec | 1.00 ns/B 951.8 MiB/s 4.31 c/B 4300 OCB auth | 0.993 ns/B 960.2 MiB/s 4.28 c/B 4304±2
- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>