cipher-gcm-ppc: tweak for better performance
* cipher/cipher-gcm-ppc.c (asm_xor, asm_mergelo, asm_mergehi) (vec_be_swap, vec_load_he, vec_store_he): New. (vec_load_be, vec_perm2, vec_aligned_st, vec_aligned_ld): Remove. (asm_vpmsumd, asm_swap_u64, vec_perm2, asm_rot_block_left) (asm_rot_block_right, asm_ashl_128, vec_aligned_ld) (_gcry_ghash_setup_ppc_vpmsum): Update 'bswap_const'. (_gcry_ghash_ppc_vpmsum): Update 'bswap_const'; Use 'asm_mergehi' and 'asm_mergelo' instead of vec_perm2; Use 'asm_xor' for fast path to enforce instruction ordering; Use 'vec_load_he' and 'vec_be_swap' for big-endian loads.
Benchmark on POWER8 (3700Mhz):
Before:
| nanosecs/byte mebibytes/sec cycles/byte
GMAC_AES | 0.169 ns/B 5647 MiB/s 0.625 c/B
After (~13% faster):
| nanosecs/byte mebibytes/sec cycles/byte
GMAC_AES | 0.149 ns/B 6385 MiB/s 0.553 c/B
- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>