Tweak AES-NI bulk CTR mode slightly
* cipher/rijndael.c [USE_AESNI] (aesni_cleanup_2_5): Rename to... (aesni_cleanup_2_6): ...this and clear also 'xmm6'. [USE_AESNI && __i386__] (do_aesni_ctr, do_aesni_ctr_4): Prevent inlining only on i386, allow on AMD64. [USE_AESNI] (do_aesni_ctr, do_aesni_ctr_4): Use counter block from 'xmm5' and byte-swap mask from 'xmm6'. (_gcry_aes_ctr_enc) [USE_AESNI]: Preload counter block to 'xmm5' and byte-swap mask to 'xmm6'. (_gcry_aes_ctr_enc, _gcry_aes_cfb_dec, _gcry_aes_cbc_dec): Use 'aesni_cleanup_2_6'.
Small tweak that yeilds ~5% more speed on Intel Core i5-4570.
After:
AES | nanosecs/byte mebibytes/sec cycles/byte
CTR enc | 0.274 ns/B 3482.5 MiB/s 0.877 c/B CTR dec | 0.274 ns/B 3486.8 MiB/s 0.876 c/B
Before:
AES | nanosecs/byte mebibytes/sec cycles/byte
CTR enc | 0.288 ns/B 3312.5 MiB/s 0.922 c/B CTR dec | 0.288 ns/B 3312.6 MiB/s 0.922 c/B
- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>