SSSE3 for single block processing with Zen5
c1d9fff3b2eb
Actions

Description

chacha20: avoid AVX512/AVX2/SSSE3 for single block processing with Zen5

* cipher/chacha20.c (CHACHA20_context_s): Add
'skip_one_block_hw_impl'.
(chacha20_blocks, do_chacha20_encrypt_stream_tail): Avoid single
block / non-parallel processing with AVX512/AVX2/SSSE3.

AMD Zen5 has slower integer vector performance than general purpose
register implementation for Chacha20. Generic C is approx 50% faster
for single block computation. Commit adjust calls to AVX512/AVX2/SSSE3
code so that tailing single block computation are handled with generic
C for AMD Zen5.