AES-NI improvements for AMD64

Authored by jukivili on Jan 6 2018, 5:53 PM.

Description

AES-NI improvements for AMD64

* cipher/rijndael-aesni.c [__x86_64__] (aesni_prepare_7_15_variable)
(aesni_prepare_7_15, aesni_cleanup_7_15, do_aesni_enc_vec8)
(do_aesni_dec_vec8, do_aesni_ctr_8): New.
(_gcry_aes_aesni_ctr_enc, _gcry_aes_aesni_cfb_dec)
(_gcry_aes_aesni_cbc_dec, aesni_ocb_enc, aesni_ocb_dec)
(_gcry_aes_aesni_ocb_auth) [__x86_64__]: Add 8 parallel blocks
processing.

Benchmarks on Intel Core i7-4790K, 4.0Ghz (no turbo, no HT):

Before:
AES | nanosecs/byte mebibytes/sec cycles/byte

 CBC dec |     0.175 ns/B    5448.7 MiB/s     0.700 c/B
 CFB dec |     0.174 ns/B    5466.2 MiB/s     0.698 c/B
 CTR enc |     0.182 ns/B    5226.0 MiB/s     0.730 c/B
 OCB enc |     0.194 ns/B    4913.9 MiB/s     0.776 c/B
 OCB dec |     0.200 ns/B    4769.2 MiB/s     0.800 c/B
OCB auth |     0.172 ns/B    5545.0 MiB/s     0.688 c/B

After (1.08x to 1.14x faster):
AES | nanosecs/byte mebibytes/sec cycles/byte

 CBC dec |     0.157 ns/B    6075.6 MiB/s     0.628 c/B
 CFB dec |     0.158 ns/B    6034.1 MiB/s     0.632 c/B
 CTR enc |     0.159 ns/B    5979.4 MiB/s     0.638 c/B
 OCB enc |     0.175 ns/B    5447.1 MiB/s     0.700 c/B
 OCB dec |     0.183 ns/B    5203.9 MiB/s     0.733 c/B
OCB auth |     0.156 ns/B    6101.3 MiB/s     0.625 c/B
  • Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>

Details

Committed
jukiviliJan 9 2018, 5:44 PM
Parents
rCb3ec0f752c92: Add ARMv8/AArch64 implementation of chacha20
Branches
Unknown
Tags
Unknown