AES-NI improvements for AMD64
* cipher/rijndael-aesni.c [__x86_64__] (aesni_prepare_7_15_variable) (aesni_prepare_7_15, aesni_cleanup_7_15, do_aesni_enc_vec8) (do_aesni_dec_vec8, do_aesni_ctr_8): New. (_gcry_aes_aesni_ctr_enc, _gcry_aes_aesni_cfb_dec) (_gcry_aes_aesni_cbc_dec, aesni_ocb_enc, aesni_ocb_dec) (_gcry_aes_aesni_ocb_auth) [__x86_64__]: Add 8 parallel blocks processing.
Benchmarks on Intel Core i7-4790K, 4.0Ghz (no turbo, no HT):
Before:
AES | nanosecs/byte mebibytes/sec cycles/byte
CBC dec | 0.175 ns/B 5448.7 MiB/s 0.700 c/B CFB dec | 0.174 ns/B 5466.2 MiB/s 0.698 c/B CTR enc | 0.182 ns/B 5226.0 MiB/s 0.730 c/B OCB enc | 0.194 ns/B 4913.9 MiB/s 0.776 c/B OCB dec | 0.200 ns/B 4769.2 MiB/s 0.800 c/B OCB auth | 0.172 ns/B 5545.0 MiB/s 0.688 c/B
After (1.08x to 1.14x faster):
AES | nanosecs/byte mebibytes/sec cycles/byte
CBC dec | 0.157 ns/B 6075.6 MiB/s 0.628 c/B CFB dec | 0.158 ns/B 6034.1 MiB/s 0.632 c/B CTR enc | 0.159 ns/B 5979.4 MiB/s 0.638 c/B OCB enc | 0.175 ns/B 5447.1 MiB/s 0.700 c/B OCB dec | 0.183 ns/B 5203.9 MiB/s 0.733 c/B OCB auth | 0.156 ns/B 6101.3 MiB/s 0.625 c/B
- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>