Optimize AES-NI CTR mode.
* cipher/rijndael.c [USE_AESNI] (do_aesni_ctr, do_aesni_ctr_4): Make handling of 64-bit overflow and carry conditional. Avoid generic to vector register passing of value '1'. Generate and use '-1' instead.
We only need to handle 64-bit carry in few special cases, that happen very
rarely. So move carry handling to slow-path and only detect need for carry
handling on fast-path. Also avoid moving '1' from generic register to vector
register, as that might be slow on some CPUs. Instead generate '-1' with
SSE2 instructions and use subtraction instead of addition to increase IV.
Overall this gives ~8% improvement in speed for AES CTR mode on Intel
Sandy-Bridge.
- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>