Add 64-bit ARMv8/CE PMULL implementation of CRC
* cipher/Makefile.am: Add 'crc-armv8-ce.c' and 'crc-armv8-aarch64-ce.S'. * cipher/asm-common-aarch64.h [HAVE_GCC_ASM_CFI_DIRECTIVES]: Add CFI helper macros. * cipher/crc-armv8-aarch64-ce.S: New. * cipher/crc-armv8-ce.c: New. * cipher/crc.c (USE_ARM_PMULL): New. (CRC_CONTEXT) [USE_ARM_PMULL]: Add 'use_pmull'. [USE_ARM_PMULL] (_gcry_crc32_armv8_ce_pmull) (_gcry_crc24rfc2440_armv8_ce_pmull): New prototypes. (crc32_init, crc32rfc1510_init, crc24rfc2440_init): Enable ARM PMULL implementations if supported by HW features. (crc32_write, crc24rfc2440_write) [USE_ARM_PMULL]: Use ARM PMULL implementations if enabled. * configure.ac: Add 'crc-armv8-ce.lo' and 'crc-armv8-aarch64-ce.lo'.
Benchmark on Cortex-A53 (at 1104 Mhz):
Before:
| nanosecs/byte mebibytes/sec cycles/byte
CRC32 | 2.89 ns/B 330.2 MiB/s 3.19 c/B
CRC32RFC1510 | 2.89 ns/B 330.2 MiB/s 3.19 c/B
CRC24RFC2440 | 2.72 ns/B 350.8 MiB/s 3.00 c/B
After (crc32 ~8.4x faster, crc24 ~6.8x faster):
| nanosecs/byte mebibytes/sec cycles/byte
CRC32 | 0.341 ns/B 2796 MiB/s 0.377 c/B
CRC32RFC1510 | 0.342 ns/B 2792 MiB/s 0.377 c/B
CRC24RFC2440 | 0.398 ns/B 2396 MiB/s 0.439 c/B
- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>