Home GnuPG

twofish-avx2-amd64: replace VPGATHER with manual gather

Description

twofish-avx2-amd64: replace VPGATHER with manual gather

* cipher/twofish-avx2-amd64.S (do_gather): New.
(g16): Switch to use 'do_gather' instead of VPGATHER instruction.
(__twofish_enc_blk16, __twofish_dec_blk16): Prepare stack
for 'do_gather'.
* cipher/twofish.c (twofish) [USE_AVX2]: Remove now unneeded
HWF_INTEL_FAST_VPGATHER check.

As VPGATHER is now slow on majority of CPUs (because of "Downfall"),
switch twofish-avx2 implementation to use manual memory gathering
instead.

Benchmark on Intel Core i3-1115G4 (tigerlake, with "Downfall" mitigated
microcode):

Before:
TWOFISH | nanosecs/byte mebibytes/sec cycles/byte auto Mhz

ECB enc |      7.00 ns/B     136.3 MiB/s     28.62 c/B      4089
ECB dec |      7.00 ns/B     136.2 MiB/s     28.64 c/B      4090

After (~3.2x faster):
TWOFISH | nanosecs/byte mebibytes/sec cycles/byte auto Mhz

ECB enc |      2.19 ns/B     435.5 MiB/s      8.95 c/B      4089
ECB dec |      2.19 ns/B     436.2 MiB/s      8.94 c/B      4089

Benchmark on AMD Ryzen 9 7900X (zen4, did not suffer from "Downfall"):

Before:
TWOFISH | nanosecs/byte mebibytes/sec cycles/byte auto Mhz

ECB enc |      1.91 ns/B     499.0 MiB/s      8.98 c/B      4700
ECB dec |      1.90 ns/B     500.7 MiB/s      8.95 c/B      4700

After (~9% faster):
TWOFISH | nanosecs/byte mebibytes/sec cycles/byte auto Mhz

ECB enc |      1.74 ns/B     547.9 MiB/s      8.18 c/B      4700
ECB dec |      1.74 ns/B     547.8 MiB/s      8.18 c/B      4700

[v2]:

  • reorder memory operations in do_gather for small performance increase.
  • Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>

Details

Provenance
jukiviliAuthored on Aug 12 2023, 8:19 PM
Parents
rCf2bf9997d465: Avoid VPGATHER usage for most of Intel CPUs
Branches
Unknown
Tags
Unknown