rijndael-aesni: tweak x86_64 AES-NI for better performance on AMD Zen2
* cipher/rijndael-aesni.c (do_aesni_enc_vec8, do_aesni_dec_vec8): Move first round key xoring and last round out to caller. (do_aesni_ctr_4): Change low 8-bit counter overflow check to 8-bit addition to low-bits and detect overflow from carry flag; Adjust slow path to restore counter. (do_aesni_ctr_8): Same as above; Interleave first round key xoring and first round with CTR generation on fast path; Interleave last round with output xoring. (_gcry_aes_aesni_cfb_dec, _gcry_aes_aesni_cbc_dec): Add first round key xoring; Change order of last round xoring and output xoring (shorten the dependency path). (_gcry_aes_aesni_ocb_auth): Add first round key xoring and last round handling.
Benchmark on Ryzen 7 3700X:
Before:
AES | nanosecs/byte mebibytes/sec cycles/byte
CBC dec | 0.113 ns/B 8445 MiB/s 0.407 c/B CFB dec | 0.114 ns/B 8337 MiB/s 0.412 c/B CTR enc | 0.112 ns/B 8505 MiB/s 0.404 c/B CTR dec | 0.113 ns/B 8476 MiB/s 0.405 c/B
After (CBC-dec +21%, CFB-dec +24%, CTR +8% faster):
AES | nanosecs/byte mebibytes/sec cycles/byte
CBC dec | 0.093 ns/B 10277 MiB/s 0.334 c/B CFB dec | 0.092 ns/B 10372 MiB/s 0.331 c/B CTR enc | 0.104 ns/B 9209 MiB/s 0.373 c/B CTR dec | 0.104 ns/B 9192 MiB/s 0.373 c/B
Performance remains the same on Intel Skylake.
- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>