camellia: add POWER8/POWER9 vcrypto implementation
* cipher/Makefile.am: Add 'camellia-simd128.h', 'camellia-ppc8le.c' and 'camellia-ppc9le.c'. * cipher/camellia-glue.c (USE_PPC_CRYPTO): New. (CAMELLIA_context) [USE_PPC_CRYPTO]: Add 'use_ppc', 'use_ppc8' and 'use_ppc9'. [USE_PPC_CRYPTO] (_gcry_camellia_ppc8_encrypt_blk16) (_gcry_camellia_ppc8_decrypt_blk16, _gcry_camellia_ppc8_keygen) (_gcry_camellia_ppc9_encrypt_blk16) (_gcry_camellia_ppc9_decrypt_blk16, _gcry_camellia_ppc9_keygen) (camellia_ppc_enc_blk16, camellia_ppc_dec_blk16) (ppc_burn_stack_depth): New. (camellia_setkey) [USE_PPC_CRYPTO]: Setup 'use_ppc', 'use_ppc8' and 'use_ppc9' and use PPC key-generation if HWF is available. (camellia_encrypt_blk1_32) (camellia_decrypt_blk1_32) [USE_PPC_CRYPTO]: Add 'use_ppc' paths. (_gcry_camellia_ocb_crypt, _gcry_camellia_ocb_auth): Enable generic bulk path when USE_PPC_CRYPTO is defined. * cipher/camellia-ppc8le.c: New. * cipher/camellia-ppc9le.c: New. * cipher/camellia-simd128.h: New. * configure.ac: Add 'camellia-ppc8le.lo' and 'camellia-ppc9le.lo'.
Patch adds 128-bit vector instrinsics implementation of Camellia
cipher and enables implementation for POWER8 and POWER9.
Benchmark on POWER9:
Before:
CAMELLIA128 | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 13.45 ns/B 70.90 MiB/s 30.94 c/B ECB dec | 13.45 ns/B 70.92 MiB/s 30.93 c/B CBC enc | 15.22 ns/B 62.66 MiB/s 35.00 c/B CBC dec | 13.54 ns/B 70.41 MiB/s 31.15 c/B CFB enc | 15.24 ns/B 62.59 MiB/s 35.04 c/B CFB dec | 13.53 ns/B 70.48 MiB/s 31.12 c/B CTR enc | 13.60 ns/B 70.15 MiB/s 31.27 c/B CTR dec | 13.62 ns/B 70.02 MiB/s 31.33 c/B XTS enc | 13.67 ns/B 69.74 MiB/s 31.45 c/B XTS dec | 13.74 ns/B 69.41 MiB/s 31.60 c/B GCM enc | 18.18 ns/B 52.45 MiB/s 41.82 c/B GCM dec | 17.76 ns/B 53.69 MiB/s 40.86 c/B GCM auth | 4.12 ns/B 231.7 MiB/s 9.47 c/B OCB enc | 14.40 ns/B 66.22 MiB/s 33.12 c/B OCB dec | 14.40 ns/B 66.23 MiB/s 33.12 c/B OCB auth | 14.37 ns/B 66.37 MiB/s 33.05 c/B
After (ECB ~4.1x faster):
CAMELLIA128 | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 3.25 ns/B 293.7 MiB/s 7.47 c/B ECB dec | 3.25 ns/B 293.4 MiB/s 7.48 c/B CBC enc | 15.22 ns/B 62.68 MiB/s 35.00 c/B CBC dec | 3.36 ns/B 284.1 MiB/s 7.72 c/B CFB enc | 15.25 ns/B 62.55 MiB/s 35.07 c/B CFB dec | 3.36 ns/B 284.0 MiB/s 7.72 c/B CTR enc | 3.47 ns/B 275.1 MiB/s 7.97 c/B CTR dec | 3.47 ns/B 275.1 MiB/s 7.97 c/B XTS enc | 3.54 ns/B 269.0 MiB/s 8.15 c/B XTS dec | 3.54 ns/B 269.6 MiB/s 8.14 c/B GCM enc | 3.69 ns/B 258.2 MiB/s 8.49 c/B GCM dec | 3.69 ns/B 258.2 MiB/s 8.50 c/B GCM auth | 0.226 ns/B 4220 MiB/s 0.520 c/B OCB enc | 3.81 ns/B 250.2 MiB/s 8.77 c/B OCB dec | 4.08 ns/B 233.8 MiB/s 9.38 c/B OCB auth | 3.53 ns/B 270.0 MiB/s 8.12 c/B
- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>