Add POWER9 little-endian variant of PPC AES implementation
* configure.ac: Add 'rijndael-ppc9le.lo'. * cipher/Makefile.am: Add 'rijndael-ppc9le.c', 'rijndael-ppc-common.h' and 'rijndael-ppc-functions.h'. * cipher/rijndael-internal.h (USE_PPC_CRYPTO_WITH_PPC9LE): New. (RIJNDAEL_context_s): Add 'use_ppc9le_crypto'. * cipher/rijndael.c (_gcry_aes_ppc9le_encrypt) (_gcry_aes_ppc9le_decrypt, _gcry_aes_ppc9le_cfb_enc) (_gcry_aes_ppc9le_cfb_dec, _gcry_aes_ppc9le_ctr_enc) (_gcry_aes_ppc9le_cbc_enc, _gcry_aes_ppc9le_cbc_dec) (_gcry_aes_ppc9le_ocb_crypt, _gcry_aes_ppc9le_ocb_auth) (_gcry_aes_ppc9le_xts_crypt): New. (do_setkey, _gcry_aes_cfb_enc, _gcry_aes_cbc_enc) (_gcry_aes_ctr_enc, _gcry_aes_cfb_dec, _gcry_aes_cbc_dec) (_gcry_aes_ocb_crypt, _gcry_aes_ocb_auth, _gcry_aes_xts_crypt) [USE_PPC_CRYPTO_WITH_PPC9LE]: New. * cipher/rijndael-ppc.c: Split common code to headers 'rijndael-ppc-common.h' and 'rijndael-ppc-functions.h'. * cipher/rijndael-ppc-common.h: Split from 'rijndael-ppc.c'. (asm_add_uint64, asm_sra_int64, asm_swap_uint64_halfs): New. * cipher/rijndael-ppc-functions.h: Split from 'rijndael-ppc.c'. (CFB_ENC_FUNC, CBC_ENC_FUNC): Unroll loop by 2. (XTS_CRYPT_FUNC, GEN_TWEAK): Tweak generation without vperm instruction. * cipher/rijndael-ppc9le.c: New.
Provide POWER9 little-endian optimized variant of PPC vcrypto AES
implementation. This implementation uses 'lxvb16x' and 'stxvb16x'
instructions to load/store vectors directly in big-endian order.
Benchmark on POWER9 (~3.8Ghz):
Before:
AES | nanosecs/byte mebibytes/sec cycles/byte
CBC enc | 1.04 ns/B 918.7 MiB/s 3.94 c/B CBC dec | 0.222 ns/B 4292 MiB/s 0.844 c/B CFB enc | 1.04 ns/B 916.9 MiB/s 3.95 c/B CFB dec | 0.224 ns/B 4252 MiB/s 0.852 c/B CTR enc | 0.226 ns/B 4218 MiB/s 0.859 c/B CTR dec | 0.225 ns/B 4233 MiB/s 0.856 c/B XTS enc | 0.500 ns/B 1907 MiB/s 1.90 c/B XTS dec | 0.494 ns/B 1932 MiB/s 1.88 c/B OCB enc | 0.288 ns/B 3312 MiB/s 1.09 c/B OCB dec | 0.292 ns/B 3266 MiB/s 1.11 c/B OCB auth | 0.267 ns/B 3567 MiB/s 1.02 c/B
After (ctr & ocb & cbc-dec & cfb-dec ~15% and xts ~8% faster):
AES | nanosecs/byte mebibytes/sec cycles/byte
CBC enc | 1.04 ns/B 914.2 MiB/s 3.96 c/B CBC dec | 0.191 ns/B 4984 MiB/s 0.727 c/B CFB enc | 1.03 ns/B 930.0 MiB/s 3.90 c/B CFB dec | 0.194 ns/B 4906 MiB/s 0.739 c/B CTR enc | 0.196 ns/B 4868 MiB/s 0.744 c/B CTR dec | 0.197 ns/B 4834 MiB/s 0.750 c/B XTS enc | 0.460 ns/B 2075 MiB/s 1.75 c/B XTS dec | 0.455 ns/B 2097 MiB/s 1.73 c/B OCB enc | 0.250 ns/B 3812 MiB/s 0.951 c/B OCB dec | 0.253 ns/B 3764 MiB/s 0.963 c/B OCB auth | 0.232 ns/B 4106 MiB/s 0.883 c/B
- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>