camellia-avx2: add partial parallel block processing
* cipher/camellia-aesni-avx2-amd64.h: Remove unnecessary vzeroupper from function entry. (enc_blk1_32, dec_blk1_32): New. * cipher/camellia-glue.c (avx_burn_stack_depth) (avx2_burn_stack_depth): Move outside of bulk functions to deduplicate. (camellia_setkey): Disable AESNI & VAES implementation when GFNI implementation is enabled. (_gcry_camellia_aesni_avx2_enc_blk1_32) (_gcry_camellia_aesni_avx2_dec_blk1_32) (_gcry_camellia_vaes_avx2_enc_blk1_32) (_gcry_camellia_vaes_avx2_dec_blk1_32) (_gcry_camellia_gfni_avx2_enc_blk1_32) (_gcry_camellia_gfni_avx2_dec_blk1_32, camellia_encrypt_blk1_32) (camellia_decrypt_blk1_32): New. (_gcry_camellia_ctr_enc, _gcry_camellia_cbc_dec, _gcry_camellia_cfb_dec) (_gcry_camellia_ocb_crypt, _gcry_camellia_ocb_auth): Use new bulk processing helpers from 'bulkhelp.h' and 'camellia_encrypt_blk1_32' and 'camellia_decrypt_blk1_32' for partial parallel processing.
- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>