Home GnuPG

Enable four block aggregated GCM Intel PCLMUL implementation on i386

Description

Enable four block aggregated GCM Intel PCLMUL implementation on i386

* cipher/cipher-gcm-intel-pclmul.c (reduction): Change "%%xmm7" to
"%%xmm5".
(gfmul_pclmul_aggr4): Move outside [__x86_64__] block; Remove usage of
XMM8-XMM15 registers; Do not preload H-values and be_mask to reduce
register usage for i386.
(_gcry_ghash_setup_intel_pclmul): Enable calculation of H2, H3 and H4
on i386.
(_gcry_ghash_intel_pclmul): Adjust to above gfmul_pclmul_aggr4
changes; Move 'aggr4' code path outside [__x86_64__] block.

Benchmark on Intel Haswell (win32):

Before:

|  nanosecs/byte   mebibytes/sec   cycles/byte  auto Mhz

GMAC_AES | 0.446 ns/B 2140 MiB/s 1.78 c/B 3998

After (~2.38x faster):

|  nanosecs/byte   mebibytes/sec   cycles/byte  auto Mhz

GMAC_AES | 0.187 ns/B 5107 MiB/s 0.747 c/B 3998

  • Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>

Details

Provenance
jukiviliAuthored on Apr 27 2019, 8:38 PM
Parents
rC1374254c2904: Prefetch GCM look-up tables
Branches
Unknown
Tags
Unknown