GCM: Add bulk processing for ARMv8/AArch64 implementation
* cipher/cipher-gcm-armv8-aarch64-ce.S: Add 6 blocks bulk processing.
Benchmark on Cortex-A53 (1152 Mhz):
Before:
| nanosecs/byte mebibytes/sec cycles/byte
GMAC_AES | 1.30 ns/B 731.6 MiB/s 1.50 c/B
After (1.49x faster):
| nanosecs/byte mebibytes/sec cycles/byte
GMAC_AES | 0.873 ns/B 1092.1 MiB/s 1.01 c/B
- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>