sha2-ppc: better optimization for POWER9
* cipher/sha256-ppc.c: Change to use vector registers, generate POWER8 and POWER9 from same code with help of 'target' and 'optimize' attribute. * cipher/sha512-ppc.c: Likewise. * configure.ac (gcry_cv_gcc_attribute_optimize) (gcry_cv_gcc_attribute_ppc_target): New.
Benchmark on POWER9:
Before:
| nanosecs/byte mebibytes/sec cycles/byte
SHA256         |      5.22 ns/B     182.8 MiB/s     12.00 c/B
 SHA512         |      3.53 ns/B     269.9 MiB/s      8.13 c/B
After (sha256 ~12% faster, sha512 ~19% faster):
| nanosecs/byte mebibytes/sec cycles/byte
SHA256         |      4.65 ns/B     204.9 MiB/s     10.71 c/B
 SHA512         |      2.97 ns/B     321.1 MiB/s      6.83 c/B
- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>