Speed-up AES-NI key setup
* cipher/rijndael.c [USE_AESNI] (m128i_t): Remove. [USE_AESNI] (u128_t): New. [USE_AESNI] (aesni_do_setkey): New. (do_setkey) [USE_AESNI]: Move AES-NI accelerated key setup to 'aesni_do_setkey'. (do_setkey): Call _gcry_get_hw_features only once. Clear stack after use in generic key setup part. (rijndael_setkey): Remove stack burning. (prepare_decryption) [USE_AESNI]: Use 'u128_t' instead of 'm128i_t' to avoid compiler generated SSE2 instructions and XMM register usage, unroll 'aesimc' setup loop (prepare_decryption): Clear stack after use. [USE_AESNI] (do_aesni_enc_aligned): Update comment about alignment. (do_decrypt): Do not burning stack after prepare_decryption.
Patch improves the speed of AES key setup with AES-NI instructions. Patch also
removes problematic the use of vector typedef, which might cause interference
with XMM register usage in AES-NI accelerated code.
New:
$ tests/benchmark --cipher-with-keysetup --cipher-repetitions 1000 cipher aes aes192 aes256
Running each test 1000 times.
ECB/Stream CBC CFB OFB CTR CCM --------------- --------------- --------------- --------------- --------------- ---------------
AES 520ms 590ms 1760ms 310ms 1640ms 300ms 1620ms 1610ms 350ms 360ms 2160ms 2140ms
AES192 640ms 680ms 2030ms 370ms 1920ms 350ms 1890ms 1880ms 400ms 410ms 2490ms 2490ms
AES256 730ms 780ms 2330ms 430ms 2210ms 420ms 2170ms 2180ms 470ms 480ms 2830ms 2840ms
Old:
$ tests/benchmark --cipher-with-keysetup --cipher-repetitions 1000 cipher aes aes192 aes256
Running each test 1000 times.
ECB/Stream CBC CFB OFB CTR CCM --------------- --------------- --------------- --------------- --------------- ---------------
AES 670ms 740ms 1910ms 470ms 1790ms 470ms 1770ms 1760ms 520ms 510ms 2310ms 2310ms
AES192 820ms 860ms 2220ms 550ms 2110ms 540ms 2070ms 2070ms 600ms 590ms 2670ms 2680ms
AES256 920ms 970ms 2510ms 620ms 2390ms 600ms 2360ms 2370ms 650ms 660ms 3020ms 3020ms
- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>