camellia-simd128: faster sbox filtering with uint8 right shift
* cipher/camellia-simd128.h (if_vpsrlb128) (if_not_vpsrlb128): New. (filter_8bit): Use 'vpsrlb128' when available on target architecture (PowerPC and AArch64).
Benchmark on POWER9:
Before:
CAMELLIA128 | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 3.26 ns/B 292.8 MiB/s 7.49 c/B ECB dec | 3.29 ns/B 290.0 MiB/s 7.56 c/B
After (~2% faster):
CAMELLIA128 | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 3.16 ns/B 301.4 MiB/s 7.28 c/B ECB dec | 3.19 ns/B 298.7 MiB/s 7.34 c/B
- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>