Home GnuPG

Add POWER9 little-endian variant of PPC AES implementation

Description

Add POWER9 little-endian variant of PPC AES implementation

* configure.ac: Add 'rijndael-ppc9le.lo'.
* cipher/Makefile.am: Add 'rijndael-ppc9le.c', 'rijndael-ppc-common.h'
and 'rijndael-ppc-functions.h'.
* cipher/rijndael-internal.h (USE_PPC_CRYPTO_WITH_PPC9LE): New.
(RIJNDAEL_context_s): Add 'use_ppc9le_crypto'.
* cipher/rijndael.c (_gcry_aes_ppc9le_encrypt)
(_gcry_aes_ppc9le_decrypt, _gcry_aes_ppc9le_cfb_enc)
(_gcry_aes_ppc9le_cfb_dec, _gcry_aes_ppc9le_ctr_enc)
(_gcry_aes_ppc9le_cbc_enc, _gcry_aes_ppc9le_cbc_dec)
(_gcry_aes_ppc9le_ocb_crypt, _gcry_aes_ppc9le_ocb_auth)
(_gcry_aes_ppc9le_xts_crypt): New.
(do_setkey, _gcry_aes_cfb_enc, _gcry_aes_cbc_enc)
(_gcry_aes_ctr_enc, _gcry_aes_cfb_dec, _gcry_aes_cbc_dec)
(_gcry_aes_ocb_crypt, _gcry_aes_ocb_auth, _gcry_aes_xts_crypt)
[USE_PPC_CRYPTO_WITH_PPC9LE]: New.
* cipher/rijndael-ppc.c: Split common code to headers
'rijndael-ppc-common.h' and 'rijndael-ppc-functions.h'.
* cipher/rijndael-ppc-common.h: Split from 'rijndael-ppc.c'.
(asm_add_uint64, asm_sra_int64, asm_swap_uint64_halfs): New.
* cipher/rijndael-ppc-functions.h: Split from 'rijndael-ppc.c'.
(CFB_ENC_FUNC, CBC_ENC_FUNC): Unroll loop by 2.
(XTS_CRYPT_FUNC, GEN_TWEAK): Tweak generation without vperm
instruction.
* cipher/rijndael-ppc9le.c: New.

Provide POWER9 little-endian optimized variant of PPC vcrypto AES
implementation. This implementation uses 'lxvb16x' and 'stxvb16x'
instructions to load/store vectors directly in big-endian order.

Benchmark on POWER9 (~3.8Ghz):

Before:
AES | nanosecs/byte mebibytes/sec cycles/byte

 CBC enc |      1.04 ns/B     918.7 MiB/s      3.94 c/B
 CBC dec |     0.222 ns/B      4292 MiB/s     0.844 c/B
 CFB enc |      1.04 ns/B     916.9 MiB/s      3.95 c/B
 CFB dec |     0.224 ns/B      4252 MiB/s     0.852 c/B
 CTR enc |     0.226 ns/B      4218 MiB/s     0.859 c/B
 CTR dec |     0.225 ns/B      4233 MiB/s     0.856 c/B
 XTS enc |     0.500 ns/B      1907 MiB/s      1.90 c/B
 XTS dec |     0.494 ns/B      1932 MiB/s      1.88 c/B
 OCB enc |     0.288 ns/B      3312 MiB/s      1.09 c/B
 OCB dec |     0.292 ns/B      3266 MiB/s      1.11 c/B
OCB auth |     0.267 ns/B      3567 MiB/s      1.02 c/B

After (ctr & ocb & cbc-dec & cfb-dec ~15% and xts ~8% faster):
AES | nanosecs/byte mebibytes/sec cycles/byte

 CBC enc |      1.04 ns/B     914.2 MiB/s      3.96 c/B
 CBC dec |     0.191 ns/B      4984 MiB/s     0.727 c/B
 CFB enc |      1.03 ns/B     930.0 MiB/s      3.90 c/B
 CFB dec |     0.194 ns/B      4906 MiB/s     0.739 c/B
 CTR enc |     0.196 ns/B      4868 MiB/s     0.744 c/B
 CTR dec |     0.197 ns/B      4834 MiB/s     0.750 c/B
 XTS enc |     0.460 ns/B      2075 MiB/s      1.75 c/B
 XTS dec |     0.455 ns/B      2097 MiB/s      1.73 c/B
 OCB enc |     0.250 ns/B      3812 MiB/s     0.951 c/B
 OCB dec |     0.253 ns/B      3764 MiB/s     0.963 c/B
OCB auth |     0.232 ns/B      4106 MiB/s     0.883 c/B
  • Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>

Details

Provenance
jukiviliAuthored on Feb 2 2020, 6:52 PM
Parents
rC5beadf201312: Add gcry_cipher_ctl command to allow weak keys in testing use-cases
Branches
Unknown
Tags
Unknown