sm4: add XTS bulk processing
* cipher/sm4.c (_gcry_sm4_xts_crypt): New. (sm4_setkey): Set XTS bulk function.
Benchmark on Ryzen 5800X:
Before:
SM4 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
XTS enc | 7.28 ns/B 131.0 MiB/s 35.31 c/B 4850 XTS dec | 7.29 ns/B 130.9 MiB/s 35.34 c/B 4850
After (4.8x faster):
SM4 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
XTS enc | 1.49 ns/B 638.6 MiB/s 7.24 c/B 4850 XTS dec | 1.49 ns/B 639.3 MiB/s 7.24 c/B 4850
Benchmark on Intel i5-6200U 2.30GHz:
Before:
SM4 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
XTS enc | 13.41 ns/B 71.10 MiB/s 37.45 c/B 2792 XTS dec | 13.43 ns/B 71.03 MiB/s 37.49 c/B 2792
After (4.54x faster):
SM4 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
XTS enc | 2.96 ns/B 322.7 MiB/s 8.25 c/B 2792 XTS dec | 2.96 ns/B 322.5 MiB/s 8.26 c/B 2792
Reviewed-and-tested-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>