Add AArch64 assembly implementation of AES
* cipher/Makefile.am: Add 'rijndael-aarch64.S'. * cipher/rijndael-aarch64.S: New. * cipher/rijndael-internal.h: Enable USE_ARM_ASM if __AARCH64EL__ and HAVE_COMPATIBLE_GCC_AARCH64_PLATFORM_AS defined. * configure.ac (gcry_cv_gcc_aarch64_platform_as_ok): New check. [host=aarch64]: Add 'rijndael-aarch64.lo'.
Patch adds ARMv8/Aarch64 implementation of AES.
Benchmark on Cortex-A53 (1536 Mhz):
Before:
AES | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 19.37 ns/B 49.22 MiB/s 29.76 c/B ECB dec | 19.85 ns/B 48.03 MiB/s 30.50 c/B CBC enc | 16.84 ns/B 56.62 MiB/s 25.87 c/B CBC dec | 16.81 ns/B 56.74 MiB/s 25.82 c/B CFB enc | 16.80 ns/B 56.75 MiB/s 25.81 c/B CFB dec | 16.81 ns/B 56.75 MiB/s 25.81 c/B OFB enc | 20.02 ns/B 47.64 MiB/s 30.75 c/B OFB dec | 20.02 ns/B 47.64 MiB/s 30.75 c/B CTR enc | 17.06 ns/B 55.91 MiB/s 26.20 c/B CTR dec | 17.06 ns/B 55.92 MiB/s 26.20 c/B CCM enc | 33.94 ns/B 28.10 MiB/s 52.13 c/B CCM dec | 33.94 ns/B 28.10 MiB/s 52.14 c/B CCM auth | 16.97 ns/B 56.18 MiB/s 26.07 c/B GCM enc | 28.70 ns/B 33.23 MiB/s 44.09 c/B GCM dec | 28.70 ns/B 33.23 MiB/s 44.09 c/B GCM auth | 11.66 ns/B 81.81 MiB/s 17.90 c/B OCB enc | 17.66 ns/B 53.99 MiB/s 27.13 c/B OCB dec | 17.61 ns/B 54.16 MiB/s 27.05 c/B OCB auth | 17.44 ns/B 54.69 MiB/s 26.78 c/B =
AES192 | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 21.82 ns/B 43.71 MiB/s 33.51 c/B ECB dec | 22.55 ns/B 42.30 MiB/s 34.63 c/B CBC enc | 19.33 ns/B 49.33 MiB/s 29.70 c/B CBC dec | 19.50 ns/B 48.91 MiB/s 29.95 c/B CFB enc | 19.29 ns/B 49.44 MiB/s 29.63 c/B CFB dec | 19.28 ns/B 49.46 MiB/s 29.61 c/B OFB enc | 22.49 ns/B 42.40 MiB/s 34.55 c/B OFB dec | 22.50 ns/B 42.38 MiB/s 34.56 c/B CTR enc | 19.53 ns/B 48.83 MiB/s 30.00 c/B CTR dec | 19.54 ns/B 48.80 MiB/s 30.02 c/B CCM enc | 38.91 ns/B 24.51 MiB/s 59.77 c/B CCM dec | 38.90 ns/B 24.51 MiB/s 59.76 c/B CCM auth | 19.45 ns/B 49.02 MiB/s 29.88 c/B GCM enc | 31.13 ns/B 30.63 MiB/s 47.82 c/B GCM dec | 31.14 ns/B 30.63 MiB/s 47.82 c/B GCM auth | 11.66 ns/B 81.80 MiB/s 17.91 c/B OCB enc | 20.15 ns/B 47.33 MiB/s 30.95 c/B OCB dec | 20.30 ns/B 46.98 MiB/s 31.18 c/B OCB auth | 19.92 ns/B 47.88 MiB/s 30.59 c/B =
AES256 | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 24.33 ns/B 39.19 MiB/s 37.38 c/B ECB dec | 25.23 ns/B 37.80 MiB/s 38.76 c/B CBC enc | 21.82 ns/B 43.71 MiB/s 33.51 c/B CBC dec | 22.18 ns/B 42.99 MiB/s 34.07 c/B CFB enc | 21.77 ns/B 43.80 MiB/s 33.44 c/B CFB dec | 21.77 ns/B 43.81 MiB/s 33.44 c/B OFB enc | 24.99 ns/B 38.16 MiB/s 38.39 c/B OFB dec | 24.99 ns/B 38.17 MiB/s 38.38 c/B CTR enc | 22.02 ns/B 43.32 MiB/s 33.82 c/B CTR dec | 22.02 ns/B 43.31 MiB/s 33.82 c/B CCM enc | 43.86 ns/B 21.74 MiB/s 67.38 c/B CCM dec | 43.87 ns/B 21.74 MiB/s 67.39 c/B CCM auth | 21.94 ns/B 43.48 MiB/s 33.69 c/B GCM enc | 33.66 ns/B 28.33 MiB/s 51.71 c/B GCM dec | 33.66 ns/B 28.33 MiB/s 51.70 c/B GCM auth | 11.69 ns/B 81.59 MiB/s 17.95 c/B OCB enc | 22.90 ns/B 41.65 MiB/s 35.17 c/B OCB dec | 23.25 ns/B 41.02 MiB/s 35.71 c/B OCB auth | 22.69 ns/B 42.03 MiB/s 34.85 c/B =
After (~1.2x faster):
AES | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 16.40 ns/B 58.16 MiB/s 25.19 c/B ECB dec | 17.01 ns/B 56.07 MiB/s 26.13 c/B CBC enc | 13.99 ns/B 68.15 MiB/s 21.49 c/B CBC dec | 14.04 ns/B 67.94 MiB/s 21.56 c/B CFB enc | 13.96 ns/B 68.32 MiB/s 21.44 c/B CFB dec | 13.95 ns/B 68.34 MiB/s 21.43 c/B OFB enc | 17.14 ns/B 55.65 MiB/s 26.32 c/B OFB dec | 17.13 ns/B 55.67 MiB/s 26.31 c/B CTR enc | 14.17 ns/B 67.31 MiB/s 21.76 c/B CTR dec | 14.17 ns/B 67.29 MiB/s 21.77 c/B CCM enc | 28.16 ns/B 33.86 MiB/s 43.26 c/B CCM dec | 28.16 ns/B 33.87 MiB/s 43.26 c/B CCM auth | 14.08 ns/B 67.71 MiB/s 21.63 c/B GCM enc | 25.82 ns/B 36.94 MiB/s 39.66 c/B GCM dec | 25.82 ns/B 36.94 MiB/s 39.65 c/B GCM auth | 11.67 ns/B 81.74 MiB/s 17.92 c/B OCB enc | 14.78 ns/B 64.55 MiB/s 22.69 c/B OCB dec | 14.80 ns/B 64.43 MiB/s 22.74 c/B OCB auth | 14.59 ns/B 65.36 MiB/s 22.41 c/B =
AES192 | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 19.05 ns/B 50.07 MiB/s 29.25 c/B ECB dec | 19.62 ns/B 48.62 MiB/s 30.13 c/B CBC enc | 16.56 ns/B 57.59 MiB/s 25.44 c/B CBC dec | 16.69 ns/B 57.14 MiB/s 25.64 c/B CFB enc | 16.52 ns/B 57.71 MiB/s 25.38 c/B CFB dec | 16.52 ns/B 57.73 MiB/s 25.37 c/B OFB enc | 19.70 ns/B 48.41 MiB/s 30.26 c/B OFB dec | 19.69 ns/B 48.43 MiB/s 30.24 c/B CTR enc | 16.73 ns/B 57.00 MiB/s 25.70 c/B CTR dec | 16.73 ns/B 57.01 MiB/s 25.70 c/B CCM enc | 33.29 ns/B 28.65 MiB/s 51.13 c/B CCM dec | 33.29 ns/B 28.65 MiB/s 51.13 c/B CCM auth | 16.65 ns/B 57.29 MiB/s 25.57 c/B GCM enc | 28.39 ns/B 33.60 MiB/s 43.60 c/B GCM dec | 28.39 ns/B 33.59 MiB/s 43.60 c/B GCM auth | 11.64 ns/B 81.92 MiB/s 17.88 c/B OCB enc | 17.33 ns/B 55.03 MiB/s 26.62 c/B OCB dec | 17.40 ns/B 54.82 MiB/s 26.72 c/B OCB auth | 17.16 ns/B 55.59 MiB/s 26.35 c/B =
AES256 | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 21.56 ns/B 44.23 MiB/s 33.12 c/B ECB dec | 22.09 ns/B 43.17 MiB/s 33.93 c/B CBC enc | 19.09 ns/B 49.97 MiB/s 29.31 c/B CBC dec | 19.13 ns/B 49.86 MiB/s 29.38 c/B CFB enc | 19.04 ns/B 50.09 MiB/s 29.24 c/B CFB dec | 19.04 ns/B 50.08 MiB/s 29.25 c/B OFB enc | 22.22 ns/B 42.93 MiB/s 34.13 c/B OFB dec | 22.22 ns/B 42.92 MiB/s 34.13 c/B CTR enc | 19.25 ns/B 49.53 MiB/s 29.57 c/B CTR dec | 19.25 ns/B 49.55 MiB/s 29.57 c/B CCM enc | 38.33 ns/B 24.88 MiB/s 58.88 c/B CCM dec | 38.34 ns/B 24.88 MiB/s 58.88 c/B CCM auth | 19.17 ns/B 49.76 MiB/s 29.44 c/B GCM enc | 30.91 ns/B 30.86 MiB/s 47.47 c/B GCM dec | 30.91 ns/B 30.85 MiB/s 47.48 c/B GCM auth | 11.71 ns/B 81.47 MiB/s 17.98 c/B OCB enc | 19.85 ns/B 48.04 MiB/s 30.49 c/B OCB dec | 19.89 ns/B 47.95 MiB/s 30.55 c/B OCB auth | 19.67 ns/B 48.48 MiB/s 30.22 c/B =
- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>