Enable four block aggregated GCM Intel PCLMUL implementation on i386
a6e7c411e5f6
Actions

Description

Enable four block aggregated GCM Intel PCLMUL implementation on i386

* cipher/cipher-gcm-intel-pclmul.c (reduction): Change "%%xmm7" to
"%%xmm5".
(gfmul_pclmul_aggr4): Move outside [__x86_64__] block; Remove usage of
XMM8-XMM15 registers; Do not preload H-values and be_mask to reduce
register usage for i386.
(_gcry_ghash_setup_intel_pclmul): Enable calculation of H2, H3 and H4
on i386.
(_gcry_ghash_intel_pclmul): Adjust to above gfmul_pclmul_aggr4
changes; Move 'aggr4' code path outside [__x86_64__] block.

Benchmark on Intel Haswell (win32):

Before:

|  nanosecs/byte   mebibytes/sec   cycles/byte  auto Mhz

GMAC_AES | 0.446 ns/B 2140 MiB/s 1.78 c/B 3998

After (~2.38x faster):

|  nanosecs/byte   mebibytes/sec   cycles/byte  auto Mhz

GMAC_AES | 0.187 ns/B 5107 MiB/s 0.747 c/B 3998

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>

Details

Provenance

jukivili

Authored on Apr 27 2019, 8:38 PM

Parents

rC1374254c2904: Prefetch GCM look-up tables

Branches

Unknown

Tags

Unknown

Event Timeline

jukivili committed rCa6e7c411e5f6: Enable four block aggregated GCM Intel PCLMUL implementation on i386 (authored by jukivili).Apr 27 2019, 9:04 PM

Changes (1)

Path

Size

cipher/

cipher-gcm-intel-pclmul.c

rCa6e7c411e5f6

View Options

cipher/cipher-gcm-intel-pclmul.c