Home GnuPG

ghash|polyval: add x86_64 VPCLMUL/AVX2 accelerated implementation

Description

ghash|polyval: add x86_64 VPCLMUL/AVX2 accelerated implementation

* cipher/cipher-gcm-intel-pclmul.c (GCM_INTEL_USE_VPCLMUL_AVX2)
(GCM_INTEL_AGGR8_TABLE_INITIALIZED)
(GCM_INTEL_AGGR16_TABLE_INITIALIZED): New.
(gfmul_pclmul): Fixes to comments.
[GCM_USE_INTEL_VPCLMUL_AVX2] (GFMUL_AGGR16_ASM_VPCMUL_AVX2)
(gfmul_vpclmul_avx2_aggr16, gfmul_vpclmul_avx2_aggr16_le)
(gfmul_pclmul_avx2, gcm_lsh_avx2, load_h1h2_to_ymm1)
(ghash_setup_aggr8_avx2, ghash_setup_aggr16_avx2): New.
(_gcry_ghash_setup_intel_pclmul): Add 'hw_features' parameter; Setup
ghash and polyval function pointers for context; Add VPCLMUL/AVX2 code
path; Defer aggr8 and aggr16 table initialization to until first use in
'_gcry_ghash_intel_pclmul' or '_gcry_polyval_intel_pclmul'.
[__x86_64__] (ghash_setup_aggr8): New.
(_gcry_ghash_intel_pclmul): Add VPCLMUL/AVX2 code path; Add call for
aggr8 table initialization.
(_gcry_polyval_intel_pclmul): Add VPCLMUL/AVX2 code path; Add call for
aggr8 table initialization.
* cipher/cipher-gcm.c [GCM_USE_INTEL_PCLMUL] (_gcry_ghash_intel_pclmul)
(_gcry_polyval_intel_pclmul): Remove.
[GCM_USE_INTEL_PCLMUL] (_gcry_ghash_setup_intel_pclmul): Add
'hw_features' parameter.
(setupM) [GCM_USE_INTEL_PCLMUL]: Pass HW features to
'_gcry_ghash_setup_intel_pclmul'; Let '_gcry_ghash_setup_intel_pclmul'
setup function pointers.
* cipher/cipher-internal.h (GCM_USE_INTEL_VPCLMUL_AVX2): New.
(gcry_cipher_handle): Add member 'gcm.hw_impl_flags'.

Patch adds VPCLMUL/AVX2 accelerated implementation for GHASH (GCM) and
POLYVAL (GCM-SIV).

Benchmark on AMD Ryzen 5800X (zen3):

Before:

             |  nanosecs/byte   mebibytes/sec   cycles/byte  auto Mhz
    GCM auth |     0.088 ns/B     10825 MiB/s     0.427 c/B      4850
GCM-SIV auth |     0.083 ns/B     11472 MiB/s     0.403 c/B      4850

After: (~1.93x faster)

             |  nanosecs/byte   mebibytes/sec   cycles/byte  auto Mhz
    GCM auth |     0.045 ns/B     21098 MiB/s     0.219 c/B      4850
GCM-SIV auth |     0.043 ns/B     22181 MiB/s     0.209 c/B      4850

AES128-GCM / AES128-GCM-SIV encryption:

            |  nanosecs/byte   mebibytes/sec   cycles/byte  auto Mhz
    GCM enc |     0.079 ns/B     12073 MiB/s     0.383 c/B      4850
GCM-SIV enc |     0.076 ns/B     12500 MiB/s     0.370 c/B      4850

Benchmark on Intel Core i3-1115G4 (tigerlake):

Before:

             |  nanosecs/byte   mebibytes/sec   cycles/byte  auto Mhz
    GCM auth |     0.080 ns/B     11919 MiB/s     0.327 c/B      4090
GCM-SIV auth |     0.075 ns/B     12643 MiB/s     0.309 c/B      4090

After: (~1.28x faster)

             |  nanosecs/byte   mebibytes/sec   cycles/byte  auto Mhz
    GCM auth |     0.062 ns/B     15348 MiB/s     0.254 c/B      4090
GCM-SIV auth |     0.058 ns/B     16381 MiB/s     0.238 c/B      4090

AES128-GCM / AES128-GCM-SIV encryption:

            |  nanosecs/byte   mebibytes/sec   cycles/byte  auto Mhz
    GCM enc |     0.101 ns/B      9441 MiB/s     0.413 c/B      4090
GCM-SIV enc |     0.098 ns/B      9692 MiB/s     0.402 c/B      4089
  • Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>

Details

Provenance
jukiviliAuthored on Mar 5 2022, 3:09 PM
Parents
rC47cafffb09d8: Add SM4 ARMv8/AArch64/CE assembly implementation
Branches
Unknown
Tags
Unknown