GCM: Add bulk processing for ARMv8/AArch32 implementation
* cipher/cipher-gcm-armv8-aarch32-ce.S: Add 4 blocks bulk processing. * tests/basic.c (check_digests): Print correct data length for "?" tests. (check_one_mac): Add large 1000000 bytes tests, when input is "!" or "?". (check_mac): Add "?" tests vectors for HMAC, CMAC, GMAC and POLY1305.
Benchmark on Cortex-A53 (1152 Mhz):
Before:
| nanosecs/byte mebibytes/sec cycles/byte
GMAC_AES | 0.924 ns/B 1032.2 MiB/s 1.06 c/B
After (1.21x faster):
| nanosecs/byte mebibytes/sec cycles/byte
GMAC_AES | 0.764 ns/B 1248.2 MiB/s 0.880 c/B
- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>