Optimizations for generic table-based GCM implementations
* cipher/cipher-gcm.c [GCM_TABLES_USE_U64] (do_fillM): Precalculate M[32..63] values. [GCM_TABLES_USE_U64] (do_ghash): Split processing of two 64-bit halfs of the input to two separate loops; Use precalculated M[] values. [GCM_USE_TABLES && !GCM_TABLES_USE_U64] (do_fillM): Precalculate M[64..127] values. [GCM_USE_TABLES && !GCM_TABLES_USE_U64] (do_ghash): Use precalculated M[] values. [GCM_USE_TABLES] (bshift): Avoid conditional execution for mask calculation. * cipher/cipher-internal.h (gcry_cipher_handle): Double gcm_table size.
Benchmark on Intel Haswell (amd64, --disable-hwf all):
Before:
| nanosecs/byte mebibytes/sec cycles/byte auto Mhz GMAC_AES | 2.79 ns/B 341.3 MiB/s 11.17 c/B 3998
After (~36% faster):
| nanosecs/byte mebibytes/sec cycles/byte auto Mhz GMAC_AES | 2.05 ns/B 464.7 MiB/s 8.20 c/B 3998
Benchmark on Intel Haswell (win32, --disable-hwf all):
Before:
| nanosecs/byte mebibytes/sec cycles/byte auto Mhz GMAC_AES | 4.90 ns/B 194.8 MiB/s 19.57 c/B 3997
After (~36% faster):
| nanosecs/byte mebibytes/sec cycles/byte auto Mhz GMAC_AES | 3.58 ns/B 266.4 MiB/s 14.31 c/B 3999
- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>