Add ARMv8/CE acceleration for AES-XTS
* cipher/rijndael-armv8-aarch32-ce.S (_gcry_aes_xts_enc_armv8_ce) (_gcry_aes_xts_dec_armv8_ce): New. * cipher/rijndael-armv8-aarch64-ce.S (_gcry_aes_xts_enc_armv8_ce) (_gcry_aes_xts_dec_armv8_ce): New. * cipher/rijndael-armv8-ce.c (_gcry_aes_xts_enc_armv8_ce) (_gcry_aes_xts_dec_armv8_ce, xts_crypt_fn_t) (_gcry_aes_armv8_ce_xts_crypt): New. * cipher/rijndael.c (_gcry_aes_armv8_ce_xts_crypt): New. (_gcry_aes_xts_crypt) [USE_ARM_CE]: New.
Benchmark on Cortex-A53 (AArch64, 1152 Mhz):
Before:
AES | nanosecs/byte mebibytes/sec cycles/byte
XTS enc | 4.88 ns/B 195.5 MiB/s 5.62 c/B XTS dec | 4.94 ns/B 192.9 MiB/s 5.70 c/B =
AES192 | nanosecs/byte mebibytes/sec cycles/byte
XTS enc | 5.55 ns/B 171.8 MiB/s 6.39 c/B XTS dec | 5.61 ns/B 169.9 MiB/s 6.47 c/B =
AES256 | nanosecs/byte mebibytes/sec cycles/byte
XTS enc | 6.22 ns/B 153.3 MiB/s 7.17 c/B XTS dec | 6.29 ns/B 151.7 MiB/s 7.24 c/B =
After (~2.6x faster):
AES | nanosecs/byte mebibytes/sec cycles/byte
XTS enc | 1.83 ns/B 520.9 MiB/s 2.11 c/B XTS dec | 1.82 ns/B 524.9 MiB/s 2.09 c/B =
AES192 | nanosecs/byte mebibytes/sec cycles/byte
XTS enc | 1.97 ns/B 483.3 MiB/s 2.27 c/B XTS dec | 1.96 ns/B 486.9 MiB/s 2.26 c/B =
AES256 | nanosecs/byte mebibytes/sec cycles/byte
XTS enc | 2.11 ns/B 450.9 MiB/s 2.44 c/B XTS dec | 2.10 ns/B 453.8 MiB/s 2.42 c/B =
Benchmark on Cortex-A53 (AArch32, 1152 Mhz):
Before:
AES | nanosecs/byte mebibytes/sec cycles/byte
XTS enc | 6.52 ns/B 146.2 MiB/s 7.51 c/B XTS dec | 6.57 ns/B 145.2 MiB/s 7.57 c/B =
AES192 | nanosecs/byte mebibytes/sec cycles/byte
XTS enc | 7.10 ns/B 134.3 MiB/s 8.18 c/B XTS dec | 7.11 ns/B 134.2 MiB/s 8.19 c/B =
AES256 | nanosecs/byte mebibytes/sec cycles/byte
XTS enc | 7.30 ns/B 130.7 MiB/s 8.41 c/B XTS dec | 7.38 ns/B 129.3 MiB/s 8.50 c/B =
After (~2.7x faster):
Cipher:
AES | nanosecs/byte mebibytes/sec cycles/byte
XTS enc | 2.33 ns/B 409.6 MiB/s 2.68 c/B XTS dec | 2.35 ns/B 405.3 MiB/s 2.71 c/B =
AES192 | nanosecs/byte mebibytes/sec cycles/byte
XTS enc | 2.53 ns/B 377.6 MiB/s 2.91 c/B XTS dec | 2.54 ns/B 375.5 MiB/s 2.93 c/B =
AES256 | nanosecs/byte mebibytes/sec cycles/byte
XTS enc | 2.75 ns/B 346.8 MiB/s 3.17 c/B XTS dec | 2.76 ns/B 345.2 MiB/s 3.18 c/B =
- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>