Avoid unneeded stack burning with AES-NI and reduce number of 'decryption_prepared' checks
* cipher/rijndael.c (RIJNDAEL_context): Make 'decryption_prepared', 'use_padlock' and 'use_aesni' 1-bit members in bitfield. (do_setkey): Move 'hwfeatures' inside [USE_AESNI || USE_PADLOCK]. (do_aesni_enc_aligned): Rename to... (do_aesni_enc): ...this, as function does not require aligned input. (do_aesni_dec_aligned): Rename to... (do_aesni_dec): ...this, as function does not require aligned input. (do_aesni): Remove. (rijndael_encrypt): Call 'do_aesni_enc' instead of 'do_aesni'. (rijndael_decrypt): Call 'do_aesni_dec' instead of 'do_aesni'. (check_decryption_preparation): New. (do_decrypt): Remove 'decryption_prepared' check. (rijndael_decrypt): Ditto and call 'check_decryption_preparation'. (_gcry_aes_cbc_dec): Ditto. (_gcry_aes_cfb_enc): Add 'burn_depth' and burn stack only when needed. (_gcry_aes_cbc_enc): Ditto. (_gcry_aes_ctr_enc): Ditto. (_gcry_aes_cfb_dec): Ditto. (_gcry_aes_cbc_dec): Ditto and correct clearing of 'savebuf'.
Patch is mostly about reducing overhead for short buffers.
Results on Intel i5-4570:
After:
$ tests/benchmark --cipher-repetitions 1000 --cipher-with-keysetup cipher aes
Running each test 1000 times.
ECB/Stream CBC CFB OFB CTR CCM --------------- --------------- --------------- --------------- --------------- ---------------
AES 480ms 540ms 1750ms 300ms 1630ms 300ms 1640ms 1640ms 350ms 350ms 2130ms 2140ms
Before:
$ tests/benchmark --cipher-repetitions 1000 --cipher-with-keysetup cipher aes
Running each test 1000 times.
ECB/Stream CBC CFB OFB CTR CCM --------------- --------------- --------------- --------------- --------------- ---------------
AES 520ms 590ms 1760ms 310ms 1640ms 310ms 1610ms 1600ms 360ms 360ms 2150ms 2160ms
- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>