blake2b-avx512: replace VPGATHER with manual gather
59f14c1db37e
Actions

Description

blake2b-avx512: replace VPGATHER with manual gather

* cipher/blake2.c (blake2b_init_ctx): Remove HWF_INTEL_FAST_VPGATHER
check for AVX512 implementation.
* cipher/blake2b-amd64-avx512.S (R16, VPINSRQ_KMASK, .Lshuf_ror16)
(.Lk1_mask): New.
(GEN_GMASK, RESET_KMASKS, .Lgmask*): Remove.
(GATHER_MSG): Use manual gather instead of VPGATHER.
(ROR_16): Use vpshufb for small speed improvement on tigerlake.
(_gcry_blake2b_transform_amd64_avx512): New setup & clean-up for
kmask registers; Reduce excess loop aligned from 64B to 16B.

As VPGATHER is now slow on majority of CPUs (because of "Downfall"),
switch blake2b-avx512 implementation to use manual memory gathering
instead.

Benchmark on Intel Core i3-1115G4 (tigerlake, with "Downfall" mitigated
microcode):

Old before "Downfall" (commit 909daa700e4b45d75469df298ee564b8fc2f4b72):

|  nanosecs/byte   mebibytes/sec   cycles/byte  auto Mhz

BLAKE2B_512 | 0.705 ns/B 1353 MiB/s 2.88 c/B 4088

Old after "Downfall" (~3.0x slower):

|  nanosecs/byte   mebibytes/sec   cycles/byte  auto Mhz

BLAKE2B_512 | 2.11 ns/B 451.3 MiB/s 8.64 c/B 4089

New (same as before "Downfall"):

|  nanosecs/byte   mebibytes/sec   cycles/byte  auto Mhz

BLAKE2B_512 | 0.705 ns/B 1353 MiB/s 2.88 c/B 4090

Benchmark on AMD Ryzen 9 7900X (zen4, did not suffer from "Downfall"):

Old:

|  nanosecs/byte   mebibytes/sec   cycles/byte  auto Mhz

BLAKE2B_512 | 0.793 ns/B 1203 MiB/s 3.73 c/B 4700

New (~3% faster):

|  nanosecs/byte   mebibytes/sec   cycles/byte  auto Mhz

BLAKE2B_512 | 0.771 ns/B 1237 MiB/s 3.62 c/B 4700

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>

Details

Provenance

jukivili

Authored on Aug 20 2023, 4:35 PM

Parents

rCded3a1ec2ec6: twofish-avx2-amd64: replace VPGATHER with manual gather

Branches

Unknown

Tags

Unknown

Event Timeline

jukivili committed rC59f14c1db37e: blake2b-avx512: replace VPGATHER with manual gather (authored by jukivili).Aug 20 2023, 5:31 PM

Changes (2)

Path

Size

cipher/

blake2.c

	blake2b-amd64-avx512.S
	blake2b-amd64-avx2.S

blake2b-avx512: replace VPGATHER with manual gather59f14c1db37eActions

Description

Details

Event Timeline

Changes (2)

rC59f14c1db37e

cipher/blake2.c

cipher/blake2b-amd64-avx512.S

blake2b-avx512: replace VPGATHER with manual gather
59f14c1db37e
Actions