Home GnuPG

aria-avx512: small optimization for aria_diff_m

Description

aria-avx512: small optimization for aria_diff_m

* cipher/aria-gfni-avx512-amd64.S (aria_diff_m): Use 'vpternlogq' for
3-way XOR operation.
---

Using vpternlogq gives small performance improvement on AMD Zen4. With
Intel tiger-lake speed is the same as before.

Benchmark on AMD Ryzen 9 7900X (zen4, turbo-freq off):

Before:
ARIA128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz

ECB enc |     0.203 ns/B      4703 MiB/s     0.953 c/B      4700
ECB dec |     0.204 ns/B      4675 MiB/s     0.959 c/B      4700
CTR enc |     0.207 ns/B      4609 MiB/s     0.973 c/B      4700
CTR dec |     0.207 ns/B      4608 MiB/s     0.973 c/B      4700

After (~3% faster):
ARIA128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz

ECB enc |     0.197 ns/B      4847 MiB/s     0.925 c/B      4700
ECB dec |     0.197 ns/B      4852 MiB/s     0.924 c/B      4700
CTR enc |     0.200 ns/B      4759 MiB/s     0.942 c/B      4700
CTR dec |     0.200 ns/B      4772 MiB/s     0.939 c/B      4700

Cc: Taehee Yoo <ap420073@gmail.com>

  • Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>

Details

Provenance
jukiviliAuthored on Feb 18 2023, 10:14 AM
Parents
rC855f1551fd92: aria-avx: small optimization for aria_ark_8way
Branches
Unknown
Tags
Unknown