Home GnuPG

aria-avx2: add VAES accelerated implementation

Description

aria-avx2: add VAES accelerated implementation

* cipher/aria-aesni-avx2-amd64.S (CONFIG_AS_VAES): New.
[CONFIG_AS_VAES]: Add VAES accelerated assembly macros and functions.
* cipher/aria.c (USE_VAES_AVX2): New.
(ARIA_context): Add 'use_vaes_avx2'.
(_gcry_aria_vaes_avx2_ecb_crypt_blk32)
(_gcry_aria_vaes_avx2_ctr_crypt_blk32)
(aria_avx2_ecb_crypt_blk32, aria_avx2_ctr_crypt_blk32): Add VAES/AVX2
code paths.
(aria_setkey): Enable VAES/AVX2 implementation based on HW features.

This patch adds VAES/AVX2 accelerated ARIA block cipher implementation.

VAES instruction set extends AESNI instructions to work on all 128-bit
lanes of 256-bit YMM and 512-bit ZMM vector registers, thus AES
operations can be executed directly on YMM registers without needing
to manually split YMM to two XMM halfs for AESNI instructions.
This improves performance on CPUs that support VAES but not GFNI, like
AMD Zen3.

Benchmark on Ryzen 7 5800X (zen3, turbo-freq off):

Before (AESNI/AVX2):
ARIA128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz

ECB enc |     0.559 ns/B      1707 MiB/s      2.12 c/B      3800
ECB dec |     0.560 ns/B      1703 MiB/s      2.13 c/B      3800
CTR enc |     0.570 ns/B      1672 MiB/s      2.17 c/B      3800
CTR dec |     0.568 ns/B      1679 MiB/s      2.16 c/B      3800

After (VAES/AVX2, ~33% faster):
ARIA128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz

ECB enc |     0.435 ns/B      2193 MiB/s      1.65 c/B      3800
ECB dec |     0.434 ns/B      2197 MiB/s      1.65 c/B      3800
CTR enc |     0.413 ns/B      2306 MiB/s      1.57 c/B      3800
CTR dec |     0.411 ns/B      2318 MiB/s      1.56 c/B      3800

Cc: Taehee Yoo <ap420073@gmail.com>

  • Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>

Details

Provenance
jukiviliAuthored on Feb 18 2023, 12:44 PM
Parents
rCf359a3ec7e84: aria-avx512: small optimization for aria_diff_m
Branches
Unknown
Tags
Unknown