Add carryless 8-bit addition fast-path for AES-NI CTR mode
* cipher/rijndael-aesni.c (do_aesni_ctr_4): Do addition using CTR in big-endian form, if least-significant byte does not overflow.
Patch improves AES-NI CTR speed by 20%.
Benchmark on Intel Haswell (3.2 Ghz):
Before:
AES | nanosecs/byte mebibytes/sec cycles/byte
CTR enc | 0.273 ns/B 3489.8 MiB/s 0.875 c/B CTR dec | 0.273 ns/B 3491.0 MiB/s 0.874 c/B
After:
CTR enc | 0.228 ns/B 4190.0 MiB/s 0.729 c/B CTR dec | 0.228 ns/B 4190.2 MiB/s 0.729 c/B
- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>