Add Aarch64 assembly implementation of Camellia
* cipher/Makefile.am: Add 'camellia-aarch64.S'. * cipher/camellia-aarch64.S: New. * cipher/camellia-glue.c [USE_ARM_ASM][__aarch64__]: Set stack burn size to zero. * cipher/camellia.h: Enable USE_ARM_ASM if __AARCH64EL__ and HAVE_COMPATIBLE_GCC_AARCH64_PLATFORM_AS defined. * configure.ac [host=aarch64]: Add 'rijndael-aarch64.lo'.
Patch adds ARMv8/Aarch64 implementation of Camellia.
Benchmark on Cortex-A53 (1152 Mhz):
Before:
CAMELLIA128 | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 39.71 ns/B 24.01 MiB/s 45.75 c/B ECB dec | 39.72 ns/B 24.01 MiB/s 45.75 c/B CBC enc | 40.80 ns/B 23.38 MiB/s 47.00 c/B CBC dec | 39.66 ns/B 24.05 MiB/s 45.69 c/B CFB enc | 40.69 ns/B 23.44 MiB/s 46.88 c/B CFB dec | 39.66 ns/B 24.05 MiB/s 45.69 c/B OFB enc | 40.69 ns/B 23.44 MiB/s 46.88 c/B OFB dec | 40.69 ns/B 23.44 MiB/s 46.88 c/B CTR enc | 39.88 ns/B 23.91 MiB/s 45.94 c/B CTR dec | 39.88 ns/B 23.91 MiB/s 45.94 c/B CCM enc | 79.97 ns/B 11.92 MiB/s 92.13 c/B CCM dec | 79.97 ns/B 11.93 MiB/s 92.13 c/B CCM auth | 40.20 ns/B 23.72 MiB/s 46.31 c/B GCM enc | 41.18 ns/B 23.16 MiB/s 47.44 c/B GCM dec | 41.18 ns/B 23.16 MiB/s 47.44 c/B GCM auth | 1.30 ns/B 732.7 MiB/s 1.50 c/B OCB enc | 42.04 ns/B 22.69 MiB/s 48.43 c/B OCB dec | 42.03 ns/B 22.69 MiB/s 48.42 c/B OCB auth | 41.38 ns/B 23.05 MiB/s 47.67 c/B =
CAMELLIA256 | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 52.36 ns/B 18.22 MiB/s 60.31 c/B ECB dec | 52.36 ns/B 18.22 MiB/s 60.31 c/B CBC enc | 53.39 ns/B 17.86 MiB/s 61.50 c/B CBC dec | 52.14 ns/B 18.29 MiB/s 60.06 c/B CFB enc | 53.28 ns/B 17.90 MiB/s 61.38 c/B CFB dec | 52.14 ns/B 18.29 MiB/s 60.06 c/B OFB enc | 53.17 ns/B 17.94 MiB/s 61.25 c/B OFB dec | 53.17 ns/B 17.94 MiB/s 61.25 c/B CTR enc | 52.36 ns/B 18.21 MiB/s 60.32 c/B CTR dec | 52.36 ns/B 18.21 MiB/s 60.32 c/B CCM enc | 105.0 ns/B 9.08 MiB/s 120.9 c/B CCM dec | 105.0 ns/B 9.08 MiB/s 120.9 c/B CCM auth | 52.74 ns/B 18.08 MiB/s 60.75 c/B GCM enc | 53.66 ns/B 17.77 MiB/s 61.81 c/B GCM dec | 53.66 ns/B 17.77 MiB/s 61.82 c/B GCM auth | 1.30 ns/B 732.3 MiB/s 1.50 c/B OCB enc | 54.54 ns/B 17.49 MiB/s 62.83 c/B OCB dec | 54.48 ns/B 17.50 MiB/s 62.77 c/B OCB auth | 53.89 ns/B 17.70 MiB/s 62.09 c/B =
After (~1.7x faster):
CAMELLIA128 | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 22.25 ns/B 42.87 MiB/s 25.63 c/B ECB dec | 22.25 ns/B 42.87 MiB/s 25.63 c/B CBC enc | 23.27 ns/B 40.97 MiB/s 26.81 c/B CBC dec | 22.14 ns/B 43.08 MiB/s 25.50 c/B CFB enc | 23.17 ns/B 41.17 MiB/s 26.69 c/B CFB dec | 22.14 ns/B 43.08 MiB/s 25.50 c/B OFB enc | 23.11 ns/B 41.26 MiB/s 26.63 c/B OFB dec | 23.11 ns/B 41.26 MiB/s 26.63 c/B CTR enc | 22.36 ns/B 42.65 MiB/s 25.76 c/B CTR dec | 22.36 ns/B 42.65 MiB/s 25.76 c/B CCM enc | 44.87 ns/B 21.26 MiB/s 51.69 c/B CCM dec | 44.87 ns/B 21.25 MiB/s 51.69 c/B CCM auth | 22.62 ns/B 42.15 MiB/s 26.06 c/B GCM enc | 23.66 ns/B 40.31 MiB/s 27.25 c/B GCM dec | 23.66 ns/B 40.31 MiB/s 27.25 c/B GCM auth | 1.30 ns/B 732.0 MiB/s 1.50 c/B OCB enc | 24.32 ns/B 39.21 MiB/s 28.02 c/B OCB dec | 24.32 ns/B 39.21 MiB/s 28.02 c/B OCB auth | 23.75 ns/B 40.15 MiB/s 27.36 c/B =
CAMELLIA256 | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 29.08 ns/B 32.79 MiB/s 33.50 c/B ECB dec | 29.19 ns/B 32.67 MiB/s 33.63 c/B CBC enc | 30.11 ns/B 31.67 MiB/s 34.69 c/B CBC dec | 29.05 ns/B 32.83 MiB/s 33.47 c/B CFB enc | 30.00 ns/B 31.79 MiB/s 34.56 c/B CFB dec | 28.97 ns/B 32.91 MiB/s 33.38 c/B OFB enc | 29.95 ns/B 31.84 MiB/s 34.50 c/B OFB dec | 29.95 ns/B 31.84 MiB/s 34.50 c/B CTR enc | 29.19 ns/B 32.67 MiB/s 33.63 c/B CTR dec | 29.19 ns/B 32.67 MiB/s 33.63 c/B CCM enc | 58.54 ns/B 16.29 MiB/s 67.43 c/B CCM dec | 58.54 ns/B 16.29 MiB/s 67.44 c/B CCM auth | 29.46 ns/B 32.37 MiB/s 33.94 c/B GCM enc | 30.49 ns/B 31.28 MiB/s 35.12 c/B GCM dec | 30.49 ns/B 31.27 MiB/s 35.13 c/B GCM auth | 1.30 ns/B 731.6 MiB/s 1.50 c/B OCB enc | 31.16 ns/B 30.61 MiB/s 35.90 c/B OCB dec | 31.22 ns/B 30.55 MiB/s 35.96 c/B OCB auth | 30.59 ns/B 31.18 MiB/s 35.24 c/B =
- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>