rijndael: further optimizations for AES-NI accelerated CBC and CFB bulk modes
* cipher/rijndael-aesni.c (do_aesni_enc, do_aesni_dec): Pass input/output through SSE register XMM0. (do_aesni_cfb): Remove. (_gcry_aes_aesni_encrypt, _gcry_aes_aesni_decrypt): Add loading/storing input/output to/from XMM0. (_gcry_aes_aesni_cfb_enc, _gcry_aes_aesni_cbc_enc) (_gcry_aes_aesni_cfb_dec): Update to use renewed 'do_aesni_enc' and move IV loading/storing outside loop. (_gcry_aes_aesni_cbc_dec): Update to use renewed 'do_aesni_dec'.
CBC encryption speed is improved ~16% on Intel Haswell and CFB encryption ~8%.
- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>