Add ARIA block cipher
* cipher/Makefile.am: Add 'aria.c'. * cipher/aria.c: New. * cipher/cipher.c (cipher_list, cipher_list_algo301): Add ARIA cipher specs. * cipher/mac-cmac.c (map_mac_algo_to_cipher): Add GCRY_MAC_CMAC_ARIA. (_gcry_mac_type_spec_cmac_aria): New. * cipher/mac-gmac.c (map_mac_algo_to_cipher): Add GCRY_MAC_GMAC_ARIA. (_gcry_mac_type_spec_gmac_aria): New. * cipher/mac-internal.h (_gcry_mac_type_spec_cmac_aria) (_gcry_mac_type_spec_gmac_aria) (_gcry_mac_type_spec_poly1305mac_aria): New. * cipher/mac-poly1305.c (poly1305mac_open): Add GCRY_MAC_GMAC_ARIA. (_gcry_mac_type_spec_poly1305mac_aria): New. * cipher/mac.c (mac_list, mac_list_algo201, mac_list_algo401) (mac_list_algo501): Add ARIA MAC specs. * configure.ac (available_ciphers): Add 'aria'. (GCRYPT_CIPHERS): Add 'aria.lo'. (USE_ARIA): New. * doc/gcrypt.texi: Add GCRY_CIPHER_ARIA128, GCRY_CIPHER_ARIA192, GCRY_CIPHER_ARIA256, GCRY_MAC_CMAC_ARIA, GCRY_MAC_GMAC_ARIA and GCRY_MAC_POLY1305_ARIA. * src/cipher.h (_gcry_cipher_spec_aria128, _gcry_cipher_spec_aria192) (_gcry_cipher_spec_aria256): New. * src/gcrypt.h.in (gcry_cipher_algos): Add GCRY_CIPHER_ARIA128, GCRY_CIPHER_ARIA192 and GCRY_CIPHER_ARIA256. (gcry_mac_algos): GCRY_MAC_CMAC_ARIA, GCRY_MAC_GMAC_ARIA and GCRY_MAC_POLY1305_ARIA. * tests/basic.c (check_ecb_cipher, check_ctr_cipher) (check_cfb_cipher, check_ocb_cipher) [USE_ARIA]: Add ARIA test-vectors. (check_ciphers) [USE_ARIA]: Add GCRY_CIPHER_ARIA128, GCRY_CIPHER_ARIA192 and GCRY_CIPHER_ARIA256. (main): Also run 'check_bulk_cipher_modes' for 'cipher_modes_only'-mode. * tests/bench-slope.c (bench_mac_init): Add GCRY_MAC_POLY1305_ARIA setiv-handling. * tests/benchmark.c (mac_bench): Likewise.
This patch adds ARIA block cipher for libgcrypt. This implementation
is based on work by Taehee Yoo, with following notable changes:
- Integration to libgcrypt, use of bithelp.h and bufhelp.h helper functions where possible.
- Added lookup table prefetching as is done in AES, GCM and SM4 implementations.
- Changed get_u8 to return u32 as returning byte caused sub-optimal code generation with gcc-12/x86-64 (zero extending from 8-bit to 32-bit register, followed by extraneous sign extending from 32-bit to 64-bit register).
- Changed 'aria_crypt' loop structure a bit for tiny performance increase (~1% seen with gcc-12/x86-64/zen4).
Benchmark on AMD Ryzen 9 7900X (x86-64):
ARIA128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
ECB enc | 3.99 ns/B 239.1 MiB/s 22.43 c/B 5625 ECB dec | 4.00 ns/B 238.4 MiB/s 22.50 c/B 5625
Benchmark on AMD Ryzen 9 7900X (win32):
ARIA128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
ECB enc | 4.57 ns/B 208.7 MiB/s 25.31 c/B 5538 ECB dec | 4.66 ns/B 204.8 MiB/s 25.39 c/B 5453
Benchmark on ARM Cortex-A53 (aarch64):
ARIA128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
ECB enc | 74.69 ns/B 12.77 MiB/s 48.40 c/B 647.9 ECB dec | 74.99 ns/B 12.72 MiB/s 48.58 c/B 647.9
Cc: Taehee Yoo <ap420073@gmail.com>
- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>