camellia: add AArch64 crypto-extension implementation
* cipher/Makefile.am: Add 'camellia-aarch64-ce.(c|o|lo)'. (aarch64_neon_cflags): New. * cipher/camellia-aarch64-ce.c: New. * cipher/camellia-glue.c (USE_AARCH64_CE): New. (CAMELLIA_context): Add 'use_aarch64ce'. (_gcry_camellia_aarch64ce_encrypt_blk16) (_gcry_camellia_aarch64ce_decrypt_blk16) (_gcry_camellia_aarch64ce_keygen, camellia_aarch64ce_enc_blk16) (camellia_aarch64ce_dec_blk16, aarch64ce_burn_stack_depth): New. (camellia_setkey) [USE_AARCH64_CE]: Set use_aarch64ce if HW has HWF_ARM_AES; Use AArch64/CE key generation if supported by HW. (camellia_encrypt_blk1_32, camellia_decrypt_blk1_32) [USE_AARCH64_CE]: Add AArch64/CE code path.
Patch enables 128-bit vector instrinsics implementation of Camellia
cipher for AArch64.
Benchmark on AWS Graviton2:
Before:
CAMELLIA128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
ECB enc | 5.99 ns/B 159.2 MiB/s 14.97 c/B 2500 ECB dec | 5.99 ns/B 159.1 MiB/s 14.98 c/B 2500 CBC enc | 6.16 ns/B 154.7 MiB/s 15.41 c/B 2500 CBC dec | 6.12 ns/B 155.8 MiB/s 15.29 c/B 2499 CFB enc | 6.49 ns/B 147.0 MiB/s 16.21 c/B 2500 CFB dec | 6.05 ns/B 157.6 MiB/s 15.13 c/B 2500 CTR enc | 6.09 ns/B 156.7 MiB/s 15.22 c/B 2500 CTR dec | 6.09 ns/B 156.6 MiB/s 15.22 c/B 2500 XTS enc | 6.16 ns/B 154.9 MiB/s 15.39 c/B 2500 XTS dec | 6.16 ns/B 154.8 MiB/s 15.40 c/B 2499 GCM enc | 6.31 ns/B 151.1 MiB/s 15.78 c/B 2500 GCM dec | 6.31 ns/B 151.1 MiB/s 15.78 c/B 2500 GCM auth | 0.206 ns/B 4635 MiB/s 0.514 c/B 2500 OCB enc | 6.63 ns/B 143.9 MiB/s 16.57 c/B 2499 OCB dec | 6.63 ns/B 143.9 MiB/s 16.56 c/B 2499 OCB auth | 6.55 ns/B 145.7 MiB/s 16.37 c/B 2499
After (ecb ~2.1x faster):
CAMELLIA128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
ECB enc | 2.77 ns/B 344.2 MiB/s 6.93 c/B 2499 ECB dec | 2.76 ns/B 345.3 MiB/s 6.90 c/B 2499 CBC enc | 6.17 ns/B 154.7 MiB/s 15.41 c/B 2499 CBC dec | 2.89 ns/B 330.3 MiB/s 7.22 c/B 2500 CFB enc | 6.48 ns/B 147.1 MiB/s 16.21 c/B 2499 CFB dec | 2.84 ns/B 336.1 MiB/s 7.09 c/B 2499 CTR enc | 2.90 ns/B 328.8 MiB/s 7.25 c/B 2499 CTR dec | 2.90 ns/B 328.9 MiB/s 7.25 c/B 2500 XTS enc | 2.93 ns/B 325.3 MiB/s 7.33 c/B 2500 XTS dec | 2.92 ns/B 326.2 MiB/s 7.31 c/B 2500 GCM enc | 3.10 ns/B 307.2 MiB/s 7.76 c/B 2500 GCM dec | 3.10 ns/B 307.2 MiB/s 7.76 c/B 2499 GCM auth | 0.206 ns/B 4635 MiB/s 0.514 c/B 2500
- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>