Add RISC-V vector permute AES
* cipher/Makefile.am: Add 'rinjdael-vp-riscv.c' and CFLAG handling for 'rijndael-vp-riscv.o' and 'rijndael-vp-riscv.lo'. (ENABLE_RISCV_VECTOR_INTRINSICS_EXTRA_CFLAGS): New. * cipher/rijndael-internal.h (USE_VP_RISCV): New. * cipher/rijndael-vp-simd128.h [__ARM_NEON]: Move ARM NEON macros to ... * cipher/rijndael-vp-aarch64.c: ... here. * cipher/rijndael-vp-riscv.c: New. * cipher/rijndael-vp-simd128.h: Use '__m128i_const' type for constant vector values and use *_amemld() macros to load these values to vector registers. [__x86_64__] (vpaddd128, vpaddb128): Remove. [__x86_64__] (psrl_byte_128, movdqa128_memld, pand128_amemld) (paddq128_amemld, paddd128_amemld, pshufb128_amemld): New. [HAVE_SIMD256] (aes_encrypt_core_4blks_simd256) (aes_decrypt_core_4blks_simd256): New. (FUNC_CTR_ENC, FUNC_CTR32LE_ENC, FUNC_CFB_DEC, FUNC_CBC_DEC) (aes_simd128_ocb_enc, aes_simd128_ocb_dec, FUNC_OCB_AUTH) (aes_simd128_ecb_enc, aes_simd128_ecb_dec, aes_simd128_xts_enc) (aes_simd128_xts_dec) [HAVE_SIMD256]: Add 4 block parallel code paths for HW with 256-bit wide vectors. * cipher/rijndael.c [USE_VP_RISCV] (_gcry_aes_vp_riscv_setup_acceleration, _gcry_aes_vp_riscv_do_setkey) (_gcry_aes_vp_riscv_prepare_decryption, _gcry_aes_vp_riscv_encrypt) (_gcry_aes_vp_riscv_decrypt, _gcry_aes_vp_riscv_cfb_enc) (_gcry_aes_vp_riscv_cbc_enc, _gcry_aes_vp_riscv_ctr_enc) (_gcry_aes_vp_riscv_ctr32le_enc, _gcry_aes_vp_riscv_cfb_dec) (_gcry_aes_vp_riscv_cbc_dec, _gcry_aes_vp_riscv_ocb_crypt) (_gcry_aes_vp_riscv_ocb_auth, _gcry_aes_vp_riscv_ecb_crypt) (_gcry_aes_vp_riscv_xts_crypt): New. (do_setkey) [USE_VP_RISCV]: Setup vector permute AES for RISC-V with HWF_RISCV_IMAFDC and HWF_RISCV_V. * cipher/simd-common-riscv.h: New. * configure.ac: Add 'rijndael-vp-riscv.lo'. (gcry_cv_cc_riscv_vector_intrinsics) (gcry_cv_cc_riscv_vector_intrinsics_cflags): New.
Patch adds AES vector permutation implementation for RISC-V with
fixed vector lengths of 128-bit and 256-bit.
Benchmark on SpacemiT K1 (1600 Mhz):
Before:
AES | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 35.30 ns/B 27.02 MiB/s 56.48 c/B ECB dec | 35.51 ns/B 26.86 MiB/s 56.81 c/B CBC enc | 35.40 ns/B 26.94 MiB/s 56.63 c/B CBC dec | 36.30 ns/B 26.27 MiB/s 58.08 c/B CFB enc | 36.25 ns/B 26.31 MiB/s 58.00 c/B CFB dec | 36.25 ns/B 26.31 MiB/s 58.00 c/B OFB enc | 38.28 ns/B 24.91 MiB/s 61.25 c/B OFB dec | 38.28 ns/B 24.91 MiB/s 61.26 c/B CTR enc | 39.81 ns/B 23.96 MiB/s 63.69 c/B CTR dec | 39.81 ns/B 23.96 MiB/s 63.69 c/B XTS enc | 36.38 ns/B 26.22 MiB/s 58.20 c/B XTS dec | 36.26 ns/B 26.30 MiB/s 58.01 c/B OCB enc | 40.94 ns/B 23.29 MiB/s 65.50 c/B OCB dec | 40.71 ns/B 23.43 MiB/s 65.13 c/B OCB auth | 37.34 ns/B 25.54 MiB/s 59.75 c/B
After:
AES | nanosecs/byte mebibytes/sec cycles/byte speed vs old
ECB enc | 16.76 ns/B 56.90 MiB/s 26.82 c/B 2.11x ECB dec | 19.94 ns/B 47.84 MiB/s 31.90 c/B 1.78x CBC enc | 31.72 ns/B 30.06 MiB/s 50.75 c/B 1.12x CBC dec | 20.24 ns/B 47.12 MiB/s 32.38 c/B 1.79x CFB enc | 31.80 ns/B 29.99 MiB/s 50.88 c/B 1.14x CFB dec | 16.87 ns/B 56.55 MiB/s 26.98 c/B 2.15x OFB enc | 38.68 ns/B 24.66 MiB/s 61.88 c/B 0.99x OFB dec | 38.65 ns/B 24.67 MiB/s 61.85 c/B 0.99x CTR enc | 16.86 ns/B 56.57 MiB/s 26.97 c/B 2.36x XTS enc | 17.49 ns/B 54.51 MiB/s 27.99 c/B 2.08x XTS dec | 20.80 ns/B 45.86 MiB/s 33.27 c/B 1.74x GCM enc | 31.16 ns/B 30.61 MiB/s 49.85 c/B 1.73x OCB enc | 17.25 ns/B 55.28 MiB/s 27.60 c/B 2.37x OCB dec | 20.64 ns/B 46.21 MiB/s 33.02 c/B 1.97x OCB auth | 17.11 ns/B 55.73 MiB/s 27.38 c/B 2.18x
- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>