camellia-gfni-avx512: speed up for round key broadcasting
* cipher/camellia-gfni-avx512-amd64.S (roundsm64, fls64): Use 'vpbroadcastb' for loading round key.
Benchmark on AMD Ryzen 9 7900X (turbo-freq off):
Before:
CAMELLIA128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
ECB enc | 0.173 ns/B 5514 MiB/s 0.813 c/B 4700 ECB dec | 0.176 ns/B 5432 MiB/s 0.825 c/B 4700
After (~13% faster):
CAMELLIA128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
ECB enc | 0.152 ns/B 6267 MiB/s 0.715 c/B 4700 ECB dec | 0.155 ns/B 6170 MiB/s 0.726 c/B 4700
- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>