Home GnuPG

rijndael-aesni: tweak x86_64 AES-NI for better performance on AMD Zen2

Description

rijndael-aesni: tweak x86_64 AES-NI for better performance on AMD Zen2

* cipher/rijndael-aesni.c (do_aesni_enc_vec8, do_aesni_dec_vec8): Move
first round key xoring and last round out to caller.
(do_aesni_ctr_4): Change low 8-bit counter overflow check to 8-bit
addition to low-bits and detect overflow from carry flag; Adjust
slow path to restore counter.
(do_aesni_ctr_8): Same as above; Interleave first round key xoring and
first round with CTR generation on fast path; Interleave last round
with output xoring.
(_gcry_aes_aesni_cfb_dec, _gcry_aes_aesni_cbc_dec): Add first round
key xoring; Change order of last round xoring and output xoring
(shorten the dependency path).
(_gcry_aes_aesni_ocb_auth): Add first round key xoring and last round
handling.

Benchmark on Ryzen 7 3700X:

Before:
AES | nanosecs/byte mebibytes/sec cycles/byte

CBC dec |     0.113 ns/B      8445 MiB/s     0.407 c/B
CFB dec |     0.114 ns/B      8337 MiB/s     0.412 c/B
CTR enc |     0.112 ns/B      8505 MiB/s     0.404 c/B
CTR dec |     0.113 ns/B      8476 MiB/s     0.405 c/B

After (CBC-dec +21%, CFB-dec +24%, CTR +8% faster):
AES | nanosecs/byte mebibytes/sec cycles/byte

CBC dec |     0.093 ns/B     10277 MiB/s     0.334 c/B
CFB dec |     0.092 ns/B     10372 MiB/s     0.331 c/B
CTR enc |     0.104 ns/B      9209 MiB/s     0.373 c/B
CTR dec |     0.104 ns/B      9192 MiB/s     0.373 c/B

Performance remains the same on Intel Skylake.

  • Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>

Details

Provenance
jukiviliAuthored on Sep 17 2020, 8:30 PM
Parents
rC9cd92ebae219: build: Allow customization of the signing key
Branches
Unknown
Tags
Unknown