avx512: tweak zmm16-zmm31 register clearing
* cipher/asm-common-amd64.h (spec_stop_avx512): Clear ymm16 before and after vpopcntb. * cipher/camellia-gfni-avx512-amd64.S (clear_zmm16_zmm31): Clear YMM16-YMM31 registers instead of XMM16-XMM31. * cipher/chacha20-amd64-avx512.S (clear_zmm16_zmm31): Likewise. * cipher/keccak-amd64-avx512.S (clear_regs): Likewise. (clear_avx512_4regs): Clear all 4 registers with XOR. * cipher/cipher-gcm-intel-pclmul.c (_gcry_ghash_intel_pclmul) (_gcry_polyval_intel_pclmul): Clear YMM16-YMM19 registers instead of ZMM16-ZMM19. * cipher/poly1305-amd64-avx512.S (POLY1305_BLOCKS): Clear YMM16-YMM31 registers after vector processing instead of XMM16-XMM31. * cipher/sha512-avx512-amd64.S (_gcry_sha512_transform_amd64_avx512): Likewise.
Clear zmm16-zmm31 registers with 256bit XOR instead of 128bit
as this is better for AMD Zen4. Also clear xmm16 register
after vpopcnt in avx512 spec-stop so we do not leave any zmm
register state which might end up unnecessarily using CPU
resources.
- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>