jukivili renamed
T5198: libgcrypt: s390x/zSeries SHA256/SHA512 acceleration from
libgcrypt: s380x/zSeries SHA1/SHA256/SHA512 acceleration to
libgcrypt: s390x/zSeries SHA1/SHA256/SHA512 acceleration.
Thanks for reporting this. You are correct, those HWCAP2_SHA1 and HWCAP2_SHA2 defines are wrong.
Add s390x/zSeries acceleration for SHA3
Add s390x/zSeries acceleration for SHA512
Add s390x/zSeries acceleration for SHA256
Add bulk AES-GCM acceleration for s390x/zSeries
Add s390x/zSeries acceleration for SHA1
Add bulk function interface for GCM mode
Add s390x/zSeries acceleration for AES
Add bulk function interface for OFB mode
hwf: add detection of s390x/zSeries hardware features
tests/bench-slope: use same benchmarking for XTS as for other modes
aarch64: mpi/longlong.h: fix operand size mismatch
aarch64: use configure check for assembly ELF directives support
tests/basic: check 32-bit and 64-bit overflow for CTR and ChaCha20
Prevent link-time optimization from inlining __gcry_burn_stack
chacha20-ppc: fix 32-bit counter overflow handling
AArch64 clang support was added to 'master' on 2018-03-28. One would need to backport commits 8ee38806245ca8452051b1a245f44082323f37f6...9b58e4a03ba3aeff7bae3f40da706977870c9649 to 1.8 branch.
Another issue that comes in to mind is that current ARM/ARM64 HW feature detection most likely wont work on MacOS. Thus HW accelerated AES&SHA&GHASH implementation wont be used.
is never defined on ARM64 as it depends on . Instead I think new check for GCC assembly ELF directives would be needed in configure.ac, similar to check. Following check should work, but I have not yet tested it:
jukivili committed
rC4a50c6b88d6d: tests: Fix typo in comment (authored by Tianjia Zhang <tianjia.zhang@linux.alibaba.com>).
tests: Fix typo in comment
rijndael: clean-up prepare_decryption function
rijndael: clean-up generic bulk functions
cipher: setup bulk functions at each algorithms key setup
rijndael: tidy do_setkey little bit
rijndael-aesni: tweak x86_64 AES-NI for better performance on AMD Zen2
So, things I see are needed to be done for inclusion of this patch are:
chacha20-aarch64: improve performance through higher SIMD interleaving
Enable jitter entropy also on non-x86 architectures
tests/bench-slope: improve CPU frequency auto-detection
Camellia AES-NI/AVX/AVX2 size optimization
random/jitterentropy: fix USE_JENT == JENT_USES_GETTIME code path
When I took side-by-side comparison of cryptogams version to this patch, what I find is that they are strikingly similar. Operation/instruction ordering matches closely to parts of ghashp8-ppc.pl. In many parts variable/register names are the same also.
Ok. This was just something that I noticed while going through configure.ac. Should I make patch for this or do you want to?
Just one question at the moment.
Add SM4 x86-64/AES-NI/AVX2 implementation
Add SM4 x86-64/AES-NI/AVX implementation
Optimizations for SM4 cipher
Thanks for the new version. Unfortunately Minicloud seems to be down and therefore cannot test patch at the moment. I'll take look when I regain power64 access.
jukivili committed
rCc1535d0b8797: tests: Add basic test-vectors for SM4 (authored by Tianjia Zhang <tianjia.zhang@linux.alibaba.com>).
tests: Add basic test-vectors for SM4
doc: add GCRY_MD_SM3, GCRY_MAC_HMAC_SM3 and GCRY_MAC_GOST28147_IMIT
jukivili committed
rCddcce166ab8b: Add SM4 symmetric cipher algorithm (authored by Tianjia Zhang <tianjia.zhang@linux.alibaba.com>).
Add SM4 symmetric cipher algorithm
Disable all assembly modules with --disable-asm
rijndael: fix UBSAN warning on left shift by 24 places with type 'int'
cipher-ocb: fix out-of-array stack memory access
gost28147: implement special MAC mode called imitovstavka (IMIT)
mac: add support for gcry_mac_ctl(GCRYCTL_SET_SBOX)
Generally nice looking patch and great improvement for performance.
ppc: avoid using vec_vsx_ld/vec_vsx_st for 2x64-bit vectors
Attached patch should solve the issue for gcc 7.5 and clang 8.
asm-poly1305-aarch64: fix building with clang
Fix wrong code execution in Poly1305 ARM/NEON implementation
Set vZZ.16b register to zero before use in armv8 gcm implementation
Add POWER9 little-endian variant of PPC AES implementation
crc-ppc: fix bad register used for vector load/store assembly
rinjdael-aes: use zero offset vector load/store when possible
Add gcry_cipher_ctl command to allow weak keys in testing use-cases
I prepared slightly different patch, with 'and r2,r2,r2' instruction removed as it is no longer needed.
Thanks for reporting this this. Your patch is correct.
Patch have been applied to master,
sexp: fix cast from 'int' pointer to 'size_t' pointer
mpi/i386: fix DWARF CFI for _gcry_mpih_sub_n and _gcry_mpih_add_n
mpi: Add .note.gnu.property section for Intel CET
amd64: Always include <config.h> in cipher assembly codes
i386: Add _CET_ENDBR to indirect jump targets
x86: Add .note.gnu.property section for Intel CET
tests/basic: add vector cluttering to detect implementation bugs
Set vZZ.16b register to zero before use in armv8 gcm implementation
jukivili committed
rC7e3aac7ba49b: mpi: Fix error that point not uninitialized (authored by Tianjia Zhang <tianjia.zhang@linux.alibaba.com>).
mpi: Fix error that point not uninitialized
gcrypt.texi: fix GCRYCTL_GET_ALGO_NENCR typo
jukivili committed
rC176a5f162acd: Update .gitignore (authored by Tianjia Zhang <tianjia.zhang@linux.alibaba.com>).
Update .gitignore
jukivili committed
rC43cfc1632dd3: ecc: Wrong flag and elements_enc fix. (authored by Tianjia Zhang <tianjia.zhang@linux.alibaba.com>).
ecc: Wrong flag and elements_enc fix.
Thanks for bug fix. I've prepared patch and send it to mailing list . Let me know if Reported-by is ok/enough. I would have liked to put you as author of commit, but this Differential interface of quite horrible and does not give all the needed information (mainly "name <email>" format for git).
rijndael-ppc: performance improvements
rijndael-ppc: fix bad register used for vector load/store assembly
cipher: fix typo in error log
I've been wondering this also. I can start working on this.
gost28147: inline gost_val function to speed up code
gost28147: do not use GOST28147_CONTEXT outside of GOST 28147 calculation
gostr3411-94: small speedup
gost28147: simplify internal code
Please note that C-based intrinsic implementation is the way to go now as that is the path chosen for PowerPC implementations in libgcrypt.
ec: fix left shift overflows on WIN64 build
mpi/amd64: use SSE2 for shifting instead of MMX
Add i386/SSSE3 implementation of SHA512
Fix building t-lock for WIN32