Add SM4 x86-64/AES-NI/AVX implementation
Optimizations for SM4 cipher
Thanks for the new version. Unfortunately Minicloud seems to be down and therefore cannot test patch at the moment. I'll take look when I regain power64 access.
jukivili committed
rCc1535d0b8797: tests: Add basic test-vectors for SM4 (authored by Tianjia Zhang <tianjia.zhang@linux.alibaba.com>).
tests: Add basic test-vectors for SM4
doc: add GCRY_MD_SM3, GCRY_MAC_HMAC_SM3 and GCRY_MAC_GOST28147_IMIT
jukivili committed
rCddcce166ab8b: Add SM4 symmetric cipher algorithm (authored by Tianjia Zhang <tianjia.zhang@linux.alibaba.com>).
Add SM4 symmetric cipher algorithm
Disable all assembly modules with --disable-asm
rijndael: fix UBSAN warning on left shift by 24 places with type 'int'
cipher-ocb: fix out-of-array stack memory access
gost28147: implement special MAC mode called imitovstavka (IMIT)
mac: add support for gcry_mac_ctl(GCRYCTL_SET_SBOX)
Generally nice looking patch and great improvement for performance.
ppc: avoid using vec_vsx_ld/vec_vsx_st for 2x64-bit vectors
Attached patch should solve the issue for gcc 7.5 and clang 8.
asm-poly1305-aarch64: fix building with clang
Fix wrong code execution in Poly1305 ARM/NEON implementation
Set vZZ.16b register to zero before use in armv8 gcm implementation
Add POWER9 little-endian variant of PPC AES implementation
crc-ppc: fix bad register used for vector load/store assembly
rinjdael-aes: use zero offset vector load/store when possible
Add gcry_cipher_ctl command to allow weak keys in testing use-cases
I prepared slightly different patch, with 'and r2,r2,r2' instruction removed as it is no longer needed.
Thanks for reporting this this. Your patch is correct.
Patch have been applied to master,
sexp: fix cast from 'int' pointer to 'size_t' pointer
mpi/i386: fix DWARF CFI for _gcry_mpih_sub_n and _gcry_mpih_add_n
mpi: Add .note.gnu.property section for Intel CET
amd64: Always include <config.h> in cipher assembly codes
i386: Add _CET_ENDBR to indirect jump targets
x86: Add .note.gnu.property section for Intel CET
tests/basic: add vector cluttering to detect implementation bugs
Set vZZ.16b register to zero before use in armv8 gcm implementation
jukivili committed
rC7e3aac7ba49b: mpi: Fix error that point not uninitialized (authored by Tianjia Zhang <tianjia.zhang@linux.alibaba.com>).
mpi: Fix error that point not uninitialized
gcrypt.texi: fix GCRYCTL_GET_ALGO_NENCR typo
jukivili committed
rC176a5f162acd: Update .gitignore (authored by Tianjia Zhang <tianjia.zhang@linux.alibaba.com>).
Update .gitignore
jukivili committed
rC43cfc1632dd3: ecc: Wrong flag and elements_enc fix. (authored by Tianjia Zhang <tianjia.zhang@linux.alibaba.com>).
ecc: Wrong flag and elements_enc fix.
Thanks for bug fix. I've prepared patch and send it to mailing list . Let me know if Reported-by is ok/enough. I would have liked to put you as author of commit, but this Differential interface of quite horrible and does not give all the needed information (mainly "name <email>" format for git).
rijndael-ppc: performance improvements
rijndael-ppc: fix bad register used for vector load/store assembly
cipher: fix typo in error log
I've been wondering this also. I can start working on this.
gost28147: inline gost_val function to speed up code
gost28147: do not use GOST28147_CONTEXT outside of GOST 28147 calculation
gostr3411-94: small speedup
gost28147: simplify internal code
Please note that C-based intrinsic implementation is the way to go now as that is the path chosen for PowerPC implementations in libgcrypt.
ec: fix left shift overflows on WIN64 build
mpi/amd64: use SSE2 for shifting instead of MMX
Add i386/SSSE3 implementation of SHA512
Fix building t-lock for WIN32
hash-common: avoid integer division to reduce call overhead
Add stitched ChaCha20-Poly1305 ARMv8/AArch64 implementation
Small tweak for PowerPC Chacha20-Poly1305 round loop
Reduce size of x86-64 stitched Chacha20-Poly1305 implementations
Add PowerPC extra CFLAGS also for chacha20-ppc and crc-ppc
Add PowerPC vpmsum implementation of CRC
Add PowerPC vector implementation of ChaCha20
poly1305: add fast addition macro for ppc64
Poly1305 addition helper for ppc64 posted on mailing list:
PowerPC SHA-256 and SHA-512 implementations with little bit more tuning committed. Most notably, SHA-512 on POWER8 now gives similar performance to OpenSSL:
Add SHA-256 implementations for POWER8 and POWER9
Add SHA-512 implementations for POWER8 and POWER9
hwf-ppc: add detection for PowerISA 3.00
Patches send to mailing list:
rijndael-ppc: add bulk modes for CBC, CFB, CTR and XTS
rijndael-ppc: enable PowerPC AES-OCB implemention
rijndael-ppc: add bulk mode for ocb_auth
rijndael-ppc: add key setup and enable single block PowerPC AES
rijndael/ppc: implement single-block mode, and implement OCB block cipher
hwf: add detection of PowerPC hardware features
Register DCO for Shawn Landden
I'll start working on PowerPC GHASH implementation in September after SHA2 is done.
I'll start working on new PowerPC SHA2 implementations for libgcrypt in coming weeks.
Patches for PowerPC AES acceleration sent to mailing-list, based partly on initial work by Shawn Landden (@slandden):
Fix use of AVX instruction in SHA1/SSSE3 assembly
Thanks. I really like this Altivec intrinsic approach. I might reimplement rest of the bulk block cipher functions this way later (if I ever get PPC HW access).