Home GnuPG

Optimizations for GCM Intel/PCLMUL implementation

Description

Optimizations for GCM Intel/PCLMUL implementation

* cipher/cipher-gcm-intel-pclmul.c (reduction): New.
(glmul_pclmul): Include shifting to left into pclmul operations; Use
'reduction' helper function.
[__x86_64__] (gfmul_pclmul_aggr4): Reorder instructions and adjust
register usage to free up registers; Use 'reduction' helper function;
Include shifting to left into pclmul operations; Moving load H values
and input from caller into this function.
[__x86_64__] (gfmul_pclmul_aggr8): New.
(gcm_lsh): New.
(_gcry_ghash_setup_intel_pclmul): Left shift H values to left by
one; Preserve XMM6-XMM15 registers on WIN64.
(_gcry_ghash_intel_pclmul) [__x86_64__]: Use 8 block aggregated
reduction function.

Benchmark on Intel Haswell (amd64):

Before:

|  nanosecs/byte   mebibytes/sec   cycles/byte  auto Mhz

GMAC_AES | 0.206 ns/B 4624 MiB/s 0.825 c/B 3998

After (+50% faster):

|  nanosecs/byte   mebibytes/sec   cycles/byte  auto Mhz

GMAC_AES | 0.137 ns/B 6953 MiB/s 0.548 c/B 3998

  • Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>

Details

Provenance
jukiviliAuthored on Apr 26 2019, 6:29 PM
Parents
rCb9be297bb8eb: Move data pointer macro for 64-bit ARM assembly to common header
Branches
Unknown
Tags
Unknown