sm4: add ARMv8 CE accelerated implementation for XTS mode
* cipher/sm4-armv8-aarch64-ce.S (_gcry_sm4_armv8_ce_xts_crypt): New. * cipher/sm4.c (_gcry_sm4_armv8_ce_xts_crypt): New. (_gcry_sm4_xts_crypt) [USE_ARM_CE]: Add ARMv8 CE implementation for XTS.
Benchmark on T-Head Yitian-710 2.75 GHz:
Before:
SM4 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
XTS enc | 0.373 ns/B 2560 MiB/s 1.02 c/B 2749 XTS dec | 0.372 ns/B 2562 MiB/s 1.02 c/B 2750
After (1.18x faster):
SM4 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
XTS enc | 0.314 ns/B 3038 MiB/s 0.863 c/B 2749 XTS dec | 0.314 ns/B 3037 MiB/s 0.863 c/B 2749
- Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>