Add SM4 ARMv8/AArch64/CE assembly implementation
* cipher/Makefile.am: Add 'sm4-armv8-aarch64-ce.S'. * cipher/sm4-armv8-aarch64-ce.S: New. * cipher/sm4.c (USE_ARM_CE): New. (SM4_context) [USE_ARM_CE]: Add 'use_arm_ce'. [USE_ARM_CE] (_gcry_sm4_armv8_ce_expand_key) (_gcry_sm4_armv8_ce_crypt, _gcry_sm4_armv8_ce_ctr_enc) (_gcry_sm4_armv8_ce_cbc_dec, _gcry_sm4_armv8_ce_cfb_dec) (_gcry_sm4_armv8_ce_crypt_blk1_8, sm4_armv8_ce_crypt_blk1_8): New. (sm4_expand_key) [USE_ARM_CE]: Use ARMv8/AArch64/CE key setup. (sm4_setkey): Enable ARMv8/AArch64/CE if supported by HW. (sm4_encrypt) [USE_ARM_CE]: Use SM4 CE encryption. (sm4_decrypt) [USE_ARM_CE]: Use SM4 CE decryption. (_gcry_sm4_ctr_enc, _gcry_sm4_cbc_dec, _gcry_sm4_cfb_dec) (_gcry_sm4_ocb_crypt, _gcry_sm4_ocb_auth) [USE_ARM_CE]: Add ARMv8/AArch64/CE bulk functions. * configure.ac: Add 'sm4-armv8-aarch64-ce.lo'.
This patch adds ARMv8/AArch64/CE bulk encryption/decryption. Bulk
functions process eight blocks in parallel.
Benchmark on T-Head Yitian-710 2.75 GHz:
Before:
SM4 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
CBC enc | 12.10 ns/B 78.79 MiB/s 33.28 c/B 2750 CBC dec | 4.63 ns/B 205.9 MiB/s 12.74 c/B 2749 CFB enc | 12.14 ns/B 78.58 MiB/s 33.37 c/B 2750 CFB dec | 4.64 ns/B 205.5 MiB/s 12.76 c/B 2750 CTR enc | 4.69 ns/B 203.3 MiB/s 12.90 c/B 2750 CTR dec | 4.69 ns/B 203.3 MiB/s 12.90 c/B 2750 GCM enc | 4.88 ns/B 195.4 MiB/s 13.42 c/B 2750 GCM dec | 4.88 ns/B 195.5 MiB/s 13.42 c/B 2750 GCM auth | 0.189 ns/B 5048 MiB/s 0.520 c/B 2750 OCB enc | 4.86 ns/B 196.0 MiB/s 13.38 c/B 2750 OCB dec | 4.90 ns/B 194.7 MiB/s 13.47 c/B 2750 OCB auth | 4.79 ns/B 199.0 MiB/s 13.18 c/B 2750
After (10x - 19x faster than ARMv8/AArch64 impl):
SM4 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
CBC enc | 1.25 ns/B 762.7 MiB/s 3.44 c/B 2749 CBC dec | 0.243 ns/B 3927 MiB/s 0.668 c/B 2750 CFB enc | 1.25 ns/B 763.1 MiB/s 3.44 c/B 2750 CFB dec | 0.245 ns/B 3899 MiB/s 0.673 c/B 2750 CTR enc | 0.298 ns/B 3199 MiB/s 0.820 c/B 2750 CTR dec | 0.298 ns/B 3198 MiB/s 0.820 c/B 2750 GCM enc | 0.487 ns/B 1957 MiB/s 1.34 c/B 2749 GCM dec | 0.487 ns/B 1959 MiB/s 1.34 c/B 2750 GCM auth | 0.189 ns/B 5048 MiB/s 0.519 c/B 2750 OCB enc | 0.443 ns/B 2150 MiB/s 1.22 c/B 2749 OCB dec | 0.486 ns/B 1964 MiB/s 1.34 c/B 2750 OCB auth | 0.369 ns/B 2585 MiB/s 1.01 c/B 2749
- Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>