Optimizations for GCM Intel/PCLMUL implementation
af5f3fb08674
Actions

Description

Optimizations for GCM Intel/PCLMUL implementation

* cipher/cipher-gcm-intel-pclmul.c (reduction): New.
(glmul_pclmul): Include shifting to left into pclmul operations; Use
'reduction' helper function.
[__x86_64__] (gfmul_pclmul_aggr4): Reorder instructions and adjust
register usage to free up registers; Use 'reduction' helper function;
Include shifting to left into pclmul operations; Moving load H values
and input from caller into this function.
[__x86_64__] (gfmul_pclmul_aggr8): New.
(gcm_lsh): New.
(_gcry_ghash_setup_intel_pclmul): Left shift H values to left by
one; Preserve XMM6-XMM15 registers on WIN64.
(_gcry_ghash_intel_pclmul) [__x86_64__]: Use 8 block aggregated
reduction function.

Benchmark on Intel Haswell (amd64):

Before:

|  nanosecs/byte   mebibytes/sec   cycles/byte  auto Mhz

GMAC_AES | 0.206 ns/B 4624 MiB/s 0.825 c/B 3998

After (+50% faster):

|  nanosecs/byte   mebibytes/sec   cycles/byte  auto Mhz

GMAC_AES | 0.137 ns/B 6953 MiB/s 0.548 c/B 3998

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>

Details

Provenance

jukivili

Authored on Apr 26 2019, 6:29 PM

Parents

rCb9be297bb8eb: Move data pointer macro for 64-bit ARM assembly to common header

Branches

Unknown

Tags

Unknown

Event Timeline

jukivili committed rCaf5f3fb08674: Optimizations for GCM Intel/PCLMUL implementation (authored by jukivili).Apr 26 2019, 6:29 PM

Changes (1)

Path

Size

cipher/

cipher-gcm-intel-pclmul.c

rCaf5f3fb08674

View Options

cipher/cipher-gcm-intel-pclmul.c