Add ARMv8/AArch32 Crypto Extension implementation of AES
* cipher/Makefile.am: Add 'rijndael-armv8-ce.c' and 'rijndael-armv-aarch32-ce.S'. * cipher/rijndael-armv8-aarch32-ce.S: New. * cipher/rijndael-armv8-ce.c: New. * cipher/rijndael-internal.h (USE_ARM_CE): New. (RIJNDAEL_context_s): Add 'use_arm_ce'. * cipher/rijndael.c [USE_ARM_CE] (_gcry_aes_armv8_ce_setkey) (_gcry_aes_armv8_ce_prepare_decryption) (_gcry_aes_armv8_ce_encrypt, _gcry_aes_armv8_ce_decrypt) (_gcry_aes_armv8_ce_cfb_enc, _gcry_aes_armv8_ce_cbc_enc) (_gcry_aes_armv8_ce_ctr_enc, _gcry_aes_armv8_ce_cfb_dec) (_gcry_aes_armv8_ce_cbc_dec, _gcry_aes_armv8_ce_ocb_crypt) (_gcry_aes_armv8_ce_ocb_auth): New. (do_setkey) [USE_ARM_CE]: Add ARM CE/AES HW feature check and key setup for ARM CE. (prepare_decryption, _gcry_aes_cfb_enc, _gcry_aes_cbc_enc) (_gcry_aes_ctr_enc, _gcry_aes_cfb_dec, _gcry_aes_cbc_dec) (_gcry_aes_ocb_crypt, _gcry_aes_ocb_auth) [USE_ARM_CE]: Add ARM CE support. * configure.ac: Add 'rijndael-armv8-ce.lo' and 'rijndael-armv8-aarch32-ce.lo'.
Improvement vs ARM assembly on Cortex-A53:
AES-128 AES-192 AES-256
CBC enc: 14.8x 12.8x 11.4x
CBC dec: 21.4x 20.5x 19.4x
CFB enc: 16.2x 13.6x 11.6x
CFB dec: 21.6x 20.5x 19.4x
CTR: 19.1x 18.6x 17.8x
OCB enc: 16.0x 16.2x 16.1x
OCB dec: 15.6x 15.9x 15.8x
OCB auth: 18.3x 18.4x 18.0x
Benchmark on Cortex-A53 (1152 Mhz):
Before:
AES | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 24.42 ns/B 39.06 MiB/s 28.13 c/B ECB dec | 25.07 ns/B 38.05 MiB/s 28.88 c/B CBC enc | 21.05 ns/B 45.30 MiB/s 24.25 c/B CBC dec | 21.16 ns/B 45.07 MiB/s 24.38 c/B CFB enc | 21.05 ns/B 45.31 MiB/s 24.25 c/B CFB dec | 21.38 ns/B 44.61 MiB/s 24.62 c/B OFB enc | 26.15 ns/B 36.47 MiB/s 30.13 c/B OFB dec | 26.15 ns/B 36.47 MiB/s 30.13 c/B CTR enc | 21.17 ns/B 45.06 MiB/s 24.38 c/B CTR dec | 21.16 ns/B 45.06 MiB/s 24.38 c/B CCM enc | 42.32 ns/B 22.53 MiB/s 48.75 c/B CCM dec | 42.32 ns/B 22.53 MiB/s 48.75 c/B CCM auth | 21.17 ns/B 45.06 MiB/s 24.38 c/B GCM enc | 22.08 ns/B 43.19 MiB/s 25.44 c/B GCM dec | 22.08 ns/B 43.18 MiB/s 25.44 c/B GCM auth | 0.923 ns/B 1032.8 MiB/s 1.06 c/B OCB enc | 26.20 ns/B 36.40 MiB/s 30.18 c/B OCB dec | 25.97 ns/B 36.73 MiB/s 29.91 c/B OCB auth | 24.52 ns/B 38.90 MiB/s 28.24 c/B =
AES192 | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 27.83 ns/B 34.26 MiB/s 32.06 c/B ECB dec | 28.54 ns/B 33.42 MiB/s 32.88 c/B CBC enc | 24.47 ns/B 38.97 MiB/s 28.19 c/B CBC dec | 25.27 ns/B 37.74 MiB/s 29.11 c/B CFB enc | 25.08 ns/B 38.02 MiB/s 28.89 c/B CFB dec | 25.31 ns/B 37.68 MiB/s 29.16 c/B OFB enc | 29.57 ns/B 32.25 MiB/s 34.06 c/B OFB dec | 29.57 ns/B 32.25 MiB/s 34.06 c/B CTR enc | 25.24 ns/B 37.78 MiB/s 29.08 c/B CTR dec | 25.24 ns/B 37.79 MiB/s 29.08 c/B CCM enc | 49.81 ns/B 19.15 MiB/s 57.38 c/B CCM dec | 49.80 ns/B 19.15 MiB/s 57.37 c/B CCM auth | 24.58 ns/B 38.80 MiB/s 28.32 c/B GCM enc | 26.15 ns/B 36.47 MiB/s 30.13 c/B GCM dec | 26.11 ns/B 36.52 MiB/s 30.08 c/B GCM auth | 0.923 ns/B 1033.0 MiB/s 1.06 c/B OCB enc | 29.59 ns/B 32.23 MiB/s 34.09 c/B OCB dec | 29.42 ns/B 32.42 MiB/s 33.89 c/B OCB auth | 27.92 ns/B 34.16 MiB/s 32.16 c/B =
AES256 | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 31.20 ns/B 30.57 MiB/s 35.94 c/B ECB dec | 31.80 ns/B 29.99 MiB/s 36.63 c/B CBC enc | 27.83 ns/B 34.27 MiB/s 32.06 c/B CBC dec | 27.87 ns/B 34.21 MiB/s 32.11 c/B CFB enc | 27.88 ns/B 34.20 MiB/s 32.12 c/B CFB dec | 28.16 ns/B 33.87 MiB/s 32.44 c/B OFB enc | 32.93 ns/B 28.96 MiB/s 37.94 c/B OFB dec | 32.93 ns/B 28.96 MiB/s 37.94 c/B CTR enc | 27.95 ns/B 34.13 MiB/s 32.19 c/B CTR dec | 27.95 ns/B 34.12 MiB/s 32.20 c/B CCM enc | 55.88 ns/B 17.07 MiB/s 64.38 c/B CCM dec | 55.88 ns/B 17.07 MiB/s 64.38 c/B CCM auth | 27.95 ns/B 34.12 MiB/s 32.20 c/B GCM enc | 28.86 ns/B 33.05 MiB/s 33.25 c/B GCM dec | 28.87 ns/B 33.04 MiB/s 33.25 c/B GCM auth | 0.923 ns/B 1033.0 MiB/s 1.06 c/B OCB enc | 32.96 ns/B 28.94 MiB/s 37.97 c/B OCB dec | 32.73 ns/B 29.14 MiB/s 37.70 c/B OCB auth | 31.29 ns/B 30.48 MiB/s 36.04 c/B
After:
AES | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 5.10 ns/B 187.0 MiB/s 5.88 c/B ECB dec | 5.27 ns/B 181.0 MiB/s 6.07 c/B CBC enc | 1.41 ns/B 675.8 MiB/s 1.63 c/B CBC dec | 0.992 ns/B 961.7 MiB/s 1.14 c/B CFB enc | 1.30 ns/B 732.4 MiB/s 1.50 c/B CFB dec | 0.991 ns/B 962.7 MiB/s 1.14 c/B OFB enc | 7.05 ns/B 135.2 MiB/s 8.13 c/B OFB dec | 7.05 ns/B 135.2 MiB/s 8.13 c/B CTR enc | 1.11 ns/B 856.9 MiB/s 1.28 c/B CTR dec | 1.11 ns/B 857.0 MiB/s 1.28 c/B CCM enc | 2.58 ns/B 369.8 MiB/s 2.97 c/B CCM dec | 2.58 ns/B 369.5 MiB/s 2.97 c/B CCM auth | 1.58 ns/B 605.2 MiB/s 1.82 c/B GCM enc | 2.04 ns/B 467.9 MiB/s 2.35 c/B GCM dec | 2.04 ns/B 466.6 MiB/s 2.35 c/B GCM auth | 0.923 ns/B 1033.0 MiB/s 1.06 c/B OCB enc | 1.64 ns/B 579.8 MiB/s 1.89 c/B OCB dec | 1.66 ns/B 574.5 MiB/s 1.91 c/B OCB auth | 1.33 ns/B 715.5 MiB/s 1.54 c/B =
AES192 | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 5.64 ns/B 169.0 MiB/s 6.50 c/B ECB dec | 5.81 ns/B 164.3 MiB/s 6.69 c/B CBC enc | 1.90 ns/B 502.1 MiB/s 2.19 c/B CBC dec | 1.24 ns/B 771.7 MiB/s 1.42 c/B CFB enc | 1.84 ns/B 517.1 MiB/s 2.12 c/B CFB dec | 1.23 ns/B 772.5 MiB/s 1.42 c/B OFB enc | 7.60 ns/B 125.5 MiB/s 8.75 c/B OFB dec | 7.60 ns/B 125.6 MiB/s 8.75 c/B CTR enc | 1.36 ns/B 702.7 MiB/s 1.56 c/B CTR dec | 1.36 ns/B 702.5 MiB/s 1.56 c/B CCM enc | 3.31 ns/B 287.8 MiB/s 3.82 c/B CCM dec | 3.31 ns/B 288.0 MiB/s 3.81 c/B CCM auth | 2.06 ns/B 462.1 MiB/s 2.38 c/B GCM enc | 2.28 ns/B 418.4 MiB/s 2.63 c/B GCM dec | 2.28 ns/B 418.0 MiB/s 2.63 c/B GCM auth | 0.923 ns/B 1032.8 MiB/s 1.06 c/B OCB enc | 1.83 ns/B 520.1 MiB/s 2.11 c/B OCB dec | 1.84 ns/B 517.8 MiB/s 2.12 c/B OCB auth | 1.52 ns/B 626.1 MiB/s 1.75 c/B =
AES256 | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 5.86 ns/B 162.7 MiB/s 6.75 c/B ECB dec | 6.02 ns/B 158.3 MiB/s 6.94 c/B CBC enc | 2.44 ns/B 390.5 MiB/s 2.81 c/B CBC dec | 1.45 ns/B 656.4 MiB/s 1.67 c/B CFB enc | 2.39 ns/B 399.5 MiB/s 2.75 c/B CFB dec | 1.45 ns/B 656.8 MiB/s 1.67 c/B OFB enc | 7.81 ns/B 122.1 MiB/s 9.00 c/B OFB dec | 7.81 ns/B 122.1 MiB/s 9.00 c/B CTR enc | 1.57 ns/B 605.8 MiB/s 1.81 c/B CTR dec | 1.57 ns/B 605.9 MiB/s 1.81 c/B CCM enc | 4.07 ns/B 234.3 MiB/s 4.69 c/B CCM dec | 4.07 ns/B 234.1 MiB/s 4.69 c/B CCM auth | 2.61 ns/B 365.7 MiB/s 3.00 c/B GCM enc | 2.50 ns/B 381.9 MiB/s 2.88 c/B GCM dec | 2.49 ns/B 382.3 MiB/s 2.87 c/B GCM auth | 0.926 ns/B 1029.7 MiB/s 1.07 c/B OCB enc | 2.05 ns/B 465.6 MiB/s 2.36 c/B OCB dec | 2.06 ns/B 462.0 MiB/s 2.38 c/B OCB auth | 1.74 ns/B 548.4 MiB/s 2.00 c/B
- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>