Home GnuPG

sm4: add amd64 GFNI/AVX512 implementation

Description

sm4: add amd64 GFNI/AVX512 implementation

* cipher/Makefile.am: Add 'sm4-gfni-avx512-amd64.S'.
* cipher/sm4-gfni-avx512-amd64.S: New.
* cipher/sm4-gfni.c (USE_GFNI_AVX512): New.
(SM4_context): Add 'use_gfni_avx512' and 'crypt_blk1_16'.
(_gcry_sm4_gfni_avx512_expand_key, _gcry_sm4_gfni_avx512_ctr_enc)
(_gcry_sm4_gfni_avx512_cbc_dec, _gcry_sm4_gfni_avx512_cfb_dec)
(_gcry_sm4_gfni_avx512_ocb_enc, _gcry_sm4_gfni_avx512_ocb_dec)
(_gcry_sm4_gfni_avx512_ocb_auth, _gcry_sm4_gfni_avx512_ctr_enc_blk32)
(_gcry_sm4_gfni_avx512_cbc_dec_blk32)
(_gcry_sm4_gfni_avx512_cfb_dec_blk32)
(_gcry_sm4_gfni_avx512_ocb_enc_blk32)
(_gcry_sm4_gfni_avx512_ocb_dec_blk32)
(_gcry_sm4_gfni_avx512_crypt_blk1_16)
(_gcry_sm4_gfni_avx512_crypt_blk32, sm4_gfni_avx512_crypt_blk1_16)
(sm4_crypt_blk1_32, sm4_encrypt_blk1_32, sm4_decrypt_blk1_32): New.
(sm4_expand_key): Add GFNI/AVX512 code-path
(sm4_setkey): Use GFNI/AVX512 if supported by CPU; Setup
`ctx->crypt_blk1_16`.
(sm4_encrypt, sm4_decrypt, sm4_get_crypt_blk1_16_fn, _gcry_sm4_ctr_enc)
(_gcry_sm4_cbc_dec, _gcry_sm4_cfb_dec, _gcry_sm4_ocb_crypt)
(_gcry_sm4_ocb_auth) [USE_GFNI_AVX512]: Add GFNI/AVX512 code path.
(_gcry_sm4_xts_crypt): Change parallel block size from 16 to 32.
* configure.ac: Add 'sm4-gfni-avx512-amd64.lo'.

Benchmark on Intel i3-1115G4 (tigerlake):

Before:
SM4 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz

 CBC enc |      9.45 ns/B     101.0 MiB/s     38.63 c/B      4089
 CBC dec |     0.647 ns/B      1475 MiB/s      2.64 c/B      4089
 CFB enc |      9.43 ns/B     101.1 MiB/s     38.57 c/B      4089
 CFB dec |     0.648 ns/B      1472 MiB/s      2.65 c/B      4089
 CTR enc |     0.661 ns/B      1443 MiB/s      2.70 c/B      4089
 CTR dec |     0.661 ns/B      1444 MiB/s      2.70 c/B      4089
 XTS enc |     0.767 ns/B      1243 MiB/s      3.14 c/B      4089
 XTS dec |     0.772 ns/B      1235 MiB/s      3.16 c/B      4089
 OCB enc |     0.671 ns/B      1421 MiB/s      2.74 c/B      4089
 OCB dec |     0.676 ns/B      1410 MiB/s      2.77 c/B      4089
OCB auth |     0.668 ns/B      1428 MiB/s      2.73 c/B      4090

After:
SM4 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz

      CBC enc |      7.80 ns/B     122.2 MiB/s     31.91 c/B      4090
      CBC dec |     0.293 ns/B      3258 MiB/s      1.20 c/B      4095±3
      CFB enc |      7.80 ns/B     122.2 MiB/s     31.90 c/B      4089
      CFB dec |     0.294 ns/B      3247 MiB/s      1.20 c/B      4096±3
      CTR enc |     0.306 ns/B      3120 MiB/s      1.25 c/B      4098±4
      CTR dec |     0.300 ns/B      3182 MiB/s      1.23 c/B      4103±6
      XTS enc |     0.431 ns/B      2211 MiB/s      1.77 c/B      4107±9
      XTS dec |     0.431 ns/B      2213 MiB/s      1.77 c/B      4102±6
      OCB enc |     0.324 ns/B      2946 MiB/s      1.33 c/B      4096±3
      OCB dec |     0.326 ns/B      2923 MiB/s      1.34 c/B      4093±2
     OCB auth |     0.536 ns/B      1779 MiB/s      2.19 c/B      4089

CBC/CFB enc: 1.20x faster
CBC/CFB dec: 2.20x faster
CTR: 2.18x faster
XTS: 1.78x faster
OCB enc/dec: 2.07x faster
OCB auth: 1.24x faster
  • Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>

Details

Provenance
jukiviliAuthored on Jul 21 2022, 10:05 AM
Parents
rC2dc265400674: Add SM4 ARMv9 SVE CE assembly implementation
Branches
Unknown
Tags
Unknown