Home GnuPG

Add RISC-V vector permute AES

Description

Add RISC-V vector permute AES

* cipher/Makefile.am: Add 'rinjdael-vp-riscv.c' and
CFLAG handling for 'rijndael-vp-riscv.o' and 'rijndael-vp-riscv.lo'.
(ENABLE_RISCV_VECTOR_INTRINSICS_EXTRA_CFLAGS): New.
* cipher/rijndael-internal.h (USE_VP_RISCV): New.
* cipher/rijndael-vp-simd128.h [__ARM_NEON]: Move ARM NEON macros to ...
* cipher/rijndael-vp-aarch64.c: ... here.
* cipher/rijndael-vp-riscv.c: New.
* cipher/rijndael-vp-simd128.h: Use '__m128i_const' type for constant
vector values and use *_amemld() macros to load these values to vector
registers.
[__x86_64__] (vpaddd128, vpaddb128): Remove.
[__x86_64__] (psrl_byte_128, movdqa128_memld, pand128_amemld)
(paddq128_amemld, paddd128_amemld, pshufb128_amemld): New.
[HAVE_SIMD256] (aes_encrypt_core_4blks_simd256)
(aes_decrypt_core_4blks_simd256): New.
(FUNC_CTR_ENC, FUNC_CTR32LE_ENC, FUNC_CFB_DEC, FUNC_CBC_DEC)
(aes_simd128_ocb_enc, aes_simd128_ocb_dec, FUNC_OCB_AUTH)
(aes_simd128_ecb_enc, aes_simd128_ecb_dec, aes_simd128_xts_enc)
(aes_simd128_xts_dec) [HAVE_SIMD256]: Add 4 block parallel code paths
for HW with 256-bit wide vectors.
* cipher/rijndael.c [USE_VP_RISCV]
(_gcry_aes_vp_riscv_setup_acceleration, _gcry_aes_vp_riscv_do_setkey)
(_gcry_aes_vp_riscv_prepare_decryption, _gcry_aes_vp_riscv_encrypt)
(_gcry_aes_vp_riscv_decrypt, _gcry_aes_vp_riscv_cfb_enc)
(_gcry_aes_vp_riscv_cbc_enc, _gcry_aes_vp_riscv_ctr_enc)
(_gcry_aes_vp_riscv_ctr32le_enc, _gcry_aes_vp_riscv_cfb_dec)
(_gcry_aes_vp_riscv_cbc_dec, _gcry_aes_vp_riscv_ocb_crypt)
(_gcry_aes_vp_riscv_ocb_auth, _gcry_aes_vp_riscv_ecb_crypt)
(_gcry_aes_vp_riscv_xts_crypt): New.
(do_setkey) [USE_VP_RISCV]: Setup vector permute AES for RISC-V with
HWF_RISCV_IMAFDC and HWF_RISCV_V.
* cipher/simd-common-riscv.h: New.
* configure.ac: Add 'rijndael-vp-riscv.lo'.
(gcry_cv_cc_riscv_vector_intrinsics)
(gcry_cv_cc_riscv_vector_intrinsics_cflags): New.

Patch adds AES vector permutation implementation for RISC-V with
fixed vector lengths of 128-bit and 256-bit.

Benchmark on SpacemiT K1 (1600 Mhz):

Before:
AES | nanosecs/byte mebibytes/sec cycles/byte

 ECB enc |     35.30 ns/B     27.02 MiB/s     56.48 c/B
 ECB dec |     35.51 ns/B     26.86 MiB/s     56.81 c/B
 CBC enc |     35.40 ns/B     26.94 MiB/s     56.63 c/B
 CBC dec |     36.30 ns/B     26.27 MiB/s     58.08 c/B
 CFB enc |     36.25 ns/B     26.31 MiB/s     58.00 c/B
 CFB dec |     36.25 ns/B     26.31 MiB/s     58.00 c/B
 OFB enc |     38.28 ns/B     24.91 MiB/s     61.25 c/B
 OFB dec |     38.28 ns/B     24.91 MiB/s     61.26 c/B
 CTR enc |     39.81 ns/B     23.96 MiB/s     63.69 c/B
 CTR dec |     39.81 ns/B     23.96 MiB/s     63.69 c/B
 XTS enc |     36.38 ns/B     26.22 MiB/s     58.20 c/B
 XTS dec |     36.26 ns/B     26.30 MiB/s     58.01 c/B
 OCB enc |     40.94 ns/B     23.29 MiB/s     65.50 c/B
 OCB dec |     40.71 ns/B     23.43 MiB/s     65.13 c/B
OCB auth |     37.34 ns/B     25.54 MiB/s     59.75 c/B

After:
AES | nanosecs/byte mebibytes/sec cycles/byte speed vs old

 ECB enc |     16.76 ns/B     56.90 MiB/s     26.82 c/B     2.11x
 ECB dec |     19.94 ns/B     47.84 MiB/s     31.90 c/B     1.78x
 CBC enc |     31.72 ns/B     30.06 MiB/s     50.75 c/B     1.12x
 CBC dec |     20.24 ns/B     47.12 MiB/s     32.38 c/B     1.79x
 CFB enc |     31.80 ns/B     29.99 MiB/s     50.88 c/B     1.14x
 CFB dec |     16.87 ns/B     56.55 MiB/s     26.98 c/B     2.15x
 OFB enc |     38.68 ns/B     24.66 MiB/s     61.88 c/B     0.99x
 OFB dec |     38.65 ns/B     24.67 MiB/s     61.85 c/B     0.99x
 CTR enc |     16.86 ns/B     56.57 MiB/s     26.97 c/B     2.36x
 XTS enc |     17.49 ns/B     54.51 MiB/s     27.99 c/B     2.08x
 XTS dec |     20.80 ns/B     45.86 MiB/s     33.27 c/B     1.74x
 GCM enc |     31.16 ns/B     30.61 MiB/s     49.85 c/B     1.73x
 OCB enc |     17.25 ns/B     55.28 MiB/s     27.60 c/B     2.37x
 OCB dec |     20.64 ns/B     46.21 MiB/s     33.02 c/B     1.97x
OCB auth |     17.11 ns/B     55.73 MiB/s     27.38 c/B     2.18x
  • Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>

Details

Provenance
jukiviliAuthored on Jan 1 2025, 12:12 PM
Parents
rC60104c2f92dc: bithelp: add count trailing zero bits variant for RISC-V
Branches
Unknown
Tags
Unknown