tests/hashtest: add hugeblock & disable-hwf options and 6 gig test vectors
keccak: Use size_t to avoid integer overflow
I've tested the different hw implementations (amd64, arm64, s390x) and they are all ok.
Fix looks good to me. This could be tested with new long running test (tests/hashtest) that would allocate 4GiB+ pattern block for inputting to gcry_md_write.
kdf: Restructure KDF test vectors
kdf: Allow empty password for Argon2
tests/basic: Add ifdefs for SM4 and CAMELLIA tests
basic: gcm-siv: add fips checks for SM4 and CAMELLIA128
sm4: add ARMv8 CE accelerated implementation for XTS mode
sm4: fix unused parameter compiler warning
Simplify AES key schedule implementation
rijndael-ppc: small speed-up for CBC and CFB encryption
@werner Could these two patches could be backported to 2.2? These changes give same level of performance increase in 2.2 as seen in 2.3.
blake2: add AVX512 accelerated implementations
sha512: add AArch64 crypto/SHA512 extension implementation
sm4-arm-sve-ce: use 32 parallel blocks for XTS and CTR32LE
sm4 & camellia: add generic bulk acceleration for CTR32LE mode (GCM-SIV)
sha3: Add x86-64 AVX512 accelerated implementation
sm4: add amd64 GFNI/AVX512 implementation
Add ARMv9 SVE2 and optional Crypto Extension HW features
jukivili committed
rC8921b5221e33: Add detection for HW feature "ARMv8 SVE" (authored by Tianjia Zhang <tianjia.zhang@linux.alibaba.com>).
Add detection for HW feature "ARMv8 SVE"
jukivili committed
rC2dc265400674: Add SM4 ARMv9 SVE CE assembly implementation (authored by Tianjia Zhang <tianjia.zhang@linux.alibaba.com>).
Add SM4 ARMv9 SVE CE assembly implementation
visibility: add missing fips_is_operational check for gcry_md_extract
hwf-x86: fix UBSAN warning
hwf-arm: add ARM HW feature detection support for MacOS
sm4: fix wrong macro used for GFNI/AVX2 code-path
tests/basic: enable IV checks for CBC/CFB/CTR bulk tests
sm4: fix use of GFNI/AVX2 accelerated key expansion
camellia-gfni-avx512: remove copy-paste / leftover extra instructions
camellia-gfni-avx512: add missing register clearing on function exits
Patch applied to master with small changes.
Chacha20/poly1305 - Optimized chacha20/poly1305 for P10 operation
ppc: enable P10 assembly with ENABLE_FORCE_SOFT_HWFEATURES on arch-3.00
Thanks for updated patch. I'm travelling next week and have time to check it closely only after I'm back. On quick glance, it looks good. What is also needed is the changelog for git commit log.
I meant interleaving integer register based 1xPoly1305 with 8xChacha20 as is done for 4xChacha20 in (interleaved so that for each 4xChaCha20 processed, 4 blocks of 1xPoly1305 is executed). Quite often microarchitectures have separate execution units for integer registers and vector registers and then it makes sense to interleave integer-poly1305 with vector-chacha20 as algorithms do not end up competing for same execution resources. Interleaving vector-poly1305 and vector-chacha20 is not likely to give performance increase (and likely to run problems with running out of vector registers).
Problem is that new assembly is using VSX registers vs14-vs31 which overlap with floating-point registers f14-f31. f14-f31 are ABI callee saved, so those need to be stored and restored.
Tested patch with small change so that HWF_PPC_ARCH_3_00 is used instead of HWF_PPC_ARCH_3_10. Building bench-slope with "-O3 -flto" makes bug in new implementation visible. Without new implementations bench-slope is ok (testing with QEMU):
-O2 problem with bench-slope seems strange. Does problem appear after this patch is applied?
aarch64-asm: use ADR for getting pointers for local labels
camellia: add amd64 GFNI/AVX512 implementation
cipher: move CBC/CFB/CTR self-tests to tests/basic
tests/basic: add testing for partial bulk processing code paths
sm4: add XTS bulk processing
sm4-aesni-avx2: add generic 1 to 16 block bulk processing function
camellia-avx2: add bulk processing for XTS mode
Add SM4 x86-64/GFNI/AVX2 implementation
sm4: deduplicate bulk processing function selection
Move bulk OCB L pointer array setup code to common header
cipher/bulkhelp: add functions for CTR/CBC/CFB/OCB bulk processing
camellia-avx2: add partial parallel block processing
Add detection for HW feature "intel-gfni"
Add GFNI/AVX2 implementation of Camellia
jukivili committed
rCa7c3e0b9b0ff: doc: Fix missing ARM hardware features (authored by Tianjia Zhang <tianjia.zhang@linux.alibaba.com>).
doc: Fix missing ARM hardware features
chacha20: add AVX512 implementation
jukivili committed
rC972aae9fc337: build: Fix for arm crypto support (authored by Tianjia Zhang <tianjia.zhang@linux.alibaba.com>).
build: Fix for arm crypto support
poly1305: add AVX512 implementation
jukivili committed
rCfe891ff4a3cd: Add SM3 ARMv8/AArch64/CE assembly implementation (authored by Tianjia Zhang <tianjia.zhang@linux.alibaba.com>).
Add SM3 ARMv8/AArch64/CE assembly implementation
Fixed in master. I rechecked that bulk implementation passes tests with qemu-ppc64le.
hwf-ppc: fix missing HWF_PPC_ARCH_3_10 in HW feature
Looks like that line went missing in third/final version of AES-GCM patch at https://dev.gnupg.org/T5700
configure: fix avx512 check for i386
jukivili committed
rC4dc707e336a9: Fix configure.ac error of intel-avx512 (authored by Tianjia Zhang <tianjia.zhang@linux.alibaba.com>).
Fix configure.ac error of intel-avx512
Fix building sha512-avx512 with clang
SHA512: Add AVX512 implementation
Fix pushed to master. Updated graph:
rijndael-vaes-avx2: perform checksumming inline
iobuf: add zerocopy optimization for iobuf_read
gpg: fix --enarmor with zero length source file
iobuf: add zerocopy optimization for iobuf_write
g10/cipher-aead: add fast path for avoid memcpy when AEAD encrypting
g10/plaintext: disable estream buffering in binary mode
Use iobuf buffer size for temporary buffer size
g10/decrypt-data: disable output estream buffering to reduce overhead
ghash|polyval: add x86_64 VPCLMUL/AVX2 accelerated implementation
ghash|polyval: add x86_64 VPCLMUL/AVX512 accelerated implementation
Add detection for HW feature "intel-avx512"