Add AES Vector Permute intrinsics implementation for AArch64
* cipher/Makefile: Add 'rijndael-vp-aarch64.c', 'rijndael-vp-simd128.h' and 'simd-common-aarch64.h'. * cipher/rijndael-internal.h (USE_VP_AARCH64): New. * cipher/rijndael-vp-aarch64.c: New. * cipher/rijndael-vp-simd128.h: New. * cipher/rijndael.c [USE_VP_AARCH64]: Add function prototypes for AArch64 vector permutation implementation. (do_setkey) [USE_VP_AARCH64]: Setup function pointers for AArch64 vector permutation implementation. * cipher/simd-common-aarch64.h: New. * configure.ac: Add 'rijndael-vp-aarch64.lo'.
Patch adds AES Vector Permute intrinsics implementation for AArch64.
This is for CPUs without crypto extensions instruction set support.
Benchmark on Cortex-A53 (1152 Mhz):
Before:
AES | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 22.31 ns/B 42.75 MiB/s 25.70 c/B ECB dec | 22.79 ns/B 41.84 MiB/s 26.26 c/B CBC enc | 18.61 ns/B 51.24 MiB/s 21.44 c/B CBC dec | 18.56 ns/B 51.37 MiB/s 21.39 c/B CFB enc | 18.56 ns/B 51.37 MiB/s 21.39 c/B CFB dec | 18.56 ns/B 51.38 MiB/s 21.38 c/B OFB enc | 22.63 ns/B 42.13 MiB/s 26.07 c/B OFB dec | 22.63 ns/B 42.13 MiB/s 26.07 c/B CTR enc | 19.05 ns/B 50.05 MiB/s 21.95 c/B CTR dec | 19.05 ns/B 50.05 MiB/s 21.95 c/B XTS enc | 19.27 ns/B 49.50 MiB/s 22.19 c/B XTS dec | 19.38 ns/B 49.22 MiB/s 22.32 c/B CCM enc | 37.71 ns/B 25.29 MiB/s 43.45 c/B
After:
AES | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 16.10 ns/B 59.23 MiB/s 18.55 c/B ECB dec | 18.35 ns/B 51.98 MiB/s 21.14 c/B CBC enc | 18.47 ns/B 51.62 MiB/s 21.28 c/B CBC dec | 18.49 ns/B 51.58 MiB/s 21.30 c/B CFB enc | 18.35 ns/B 51.98 MiB/s 21.13 c/B CFB dec | 16.24 ns/B 58.72 MiB/s 18.71 c/B OFB enc | 22.58 ns/B 42.24 MiB/s 26.01 c/B OFB dec | 22.58 ns/B 42.24 MiB/s 26.01 c/B CTR enc | 16.27 ns/B 58.61 MiB/s 18.75 c/B CTR dec | 16.27 ns/B 58.61 MiB/s 18.75 c/B XTS enc | 16.56 ns/B 57.60 MiB/s 19.07 c/B XTS dec | 18.92 ns/B 50.41 MiB/s 21.79 c/B
- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>