camellia-simd128: optimize round key loading and key setup
* cipher/camellia-simd128.h (if_vprolb128, vprolb128) (vmovd128_amemld, vmovq128_amemld, vmovq128_memld) (memory_barrier_with_vec, filter_8bit_3op): New. (LE64_LO32, LE64_HI32): Remove. (roundsm16, fls16, inpack16_pre, outunpack16): Use 'vmovd128_amemld' and 'vmovq128_amemld' for loading round keys. (camellia_f): Optimize/Rewrite and split core to ... (camellia_f_core): ... this. (camellia_f_xor_x): New. (sp0044440444044404mask, sp1110111010011110mask) (sp0222022222000222mask, sp3033303303303033mask): Adjust constants for optimized/rewritten 'camellia_f'. (camellia_setup128, camellia_setup256): Adjust for optimized 'camellia_f'; Use 'vmovq128_amemld' for loading round keys. (FUNC_KEY_SETUP): Use 'vmovq128_amemld' instead of 'vmovq128'.
- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>