Page MenuHome GnuPG
Feed Advanced Search

Jul 7 2022

jukivili updated the task description for T4460: libgcrypt performance TODOs.
Jul 7 2022, 10:36 AM · libgcrypt

Jul 6 2022

jukivili updated the task description for T4460: libgcrypt performance TODOs.
Jul 6 2022, 8:19 PM · libgcrypt
jukivili committed rC66ef99bb1804: sm4: fix wrong macro used for GFNI/AVX2 code-path (authored by jukivili).
sm4: fix wrong macro used for GFNI/AVX2 code-path
Jul 6 2022, 12:17 PM
jukivili committed rCfd3ed68754eb: tests/basic: enable IV checks for CBC/CFB/CTR bulk tests (authored by jukivili).
tests/basic: enable IV checks for CBC/CFB/CTR bulk tests
Jul 6 2022, 12:17 PM
jukivili committed rC935e211af145: sm4: fix use of GFNI/AVX2 accelerated key expansion (authored by jukivili).
sm4: fix use of GFNI/AVX2 accelerated key expansion
Jul 6 2022, 12:17 PM
jukivili committed rC99b7375bd616: camellia-gfni-avx512: remove copy-paste / leftover extra instructions (authored by jukivili).
camellia-gfni-avx512: remove copy-paste / leftover extra instructions
Jul 6 2022, 12:17 PM
jukivili committed rCac14d9ee7a09: camellia-gfni-avx512: add missing register clearing on function exits (authored by jukivili).
camellia-gfni-avx512: add missing register clearing on function exits
Jul 6 2022, 12:17 PM

Jun 12 2022

jukivili closed T6006: Optimize Chacha20 and Poly1305 for PPC P10 LE as Resolved.
Jun 12 2022, 9:58 PM · patch, ppc, Feature Request, libgcrypt
jukivili added a comment to T6006: Optimize Chacha20 and Poly1305 for PPC P10 LE.

Patch applied to master with small changes.

Jun 12 2022, 9:58 PM · patch, ppc, Feature Request, libgcrypt
jukivili committed rC88fe7ac33eb4: Chacha20/poly1305 - Optimized chacha20/poly1305 for P10 operation (authored by dannytsen).
Chacha20/poly1305 - Optimized chacha20/poly1305 for P10 operation
Jun 12 2022, 9:14 PM
jukivili committed rC2c5e5ab6843d: ppc: enable P10 assembly with ENABLE_FORCE_SOFT_HWFEATURES on arch-3.00 (authored by jukivili).
ppc: enable P10 assembly with ENABLE_FORCE_SOFT_HWFEATURES on arch-3.00
Jun 12 2022, 9:14 PM

Jun 3 2022

jukivili added a comment to T6006: Optimize Chacha20 and Poly1305 for PPC P10 LE.

Thanks for updated patch. I'm travelling next week and have time to check it closely only after I'm back. On quick glance, it looks good. What is also needed is the changelog for git commit log.

Jun 3 2022, 10:30 AM · patch, ppc, Feature Request, libgcrypt

Jun 1 2022

jukivili added a comment to T6006: Optimize Chacha20 and Poly1305 for PPC P10 LE.

I meant interleaving integer register based 1xPoly1305 with 8xChacha20 as is done for 4xChacha20 in cipher/chacha20-ppc.c (interleaved so that for each 4xChaCha20 processed, 4 blocks of 1xPoly1305 is executed). Quite often microarchitectures have separate execution units for integer registers and vector registers and then it makes sense to interleave integer-poly1305 with vector-chacha20 as algorithms do not end up competing for same execution resources. Interleaving vector-poly1305 and vector-chacha20 is not likely to give performance increase (and likely to run problems with running out of vector registers).

Jun 1 2022, 5:37 PM · patch, ppc, Feature Request, libgcrypt

May 28 2022

jukivili added a comment to T6006: Optimize Chacha20 and Poly1305 for PPC P10 LE.

Problem is that new assembly is using VSX registers vs14-vs31 which overlap with floating-point registers f14-f31. f14-f31 are ABI callee saved, so those need to be stored and restored.

May 28 2022, 9:04 PM · patch, ppc, Feature Request, libgcrypt
jukivili added a comment to T6006: Optimize Chacha20 and Poly1305 for PPC P10 LE.

Tested patch with small change so that HWF_PPC_ARCH_3_00 is used instead of HWF_PPC_ARCH_3_10. Building bench-slope with "-O3 -flto" makes bug in new implementation visible. Without new implementations bench-slope is ok (testing with QEMU):

$ tests/bench-slope --disable-hwf ppc-arch_3_00 cipher chacha20
Cipher:
 CHACHA20       |  nanosecs/byte   mebibytes/sec   cycles/byte
     STREAM enc |      2.35 ns/B     405.0 MiB/s         - c/B
     STREAM dec |      2.32 ns/B     410.7 MiB/s         - c/B
   POLY1305 enc |      2.46 ns/B     388.0 MiB/s         - c/B
   POLY1305 dec |      2.34 ns/B     408.1 MiB/s         - c/B
  POLY1305 auth |     0.238 ns/B      4003 MiB/s         - c/B
May 28 2022, 6:49 PM · patch, ppc, Feature Request, libgcrypt

May 27 2022

jukivili added a comment to T6006: Optimize Chacha20 and Poly1305 for PPC P10 LE.

-O2 problem with bench-slope seems strange. Does problem appear after this patch is applied?

May 27 2022, 7:15 PM · patch, ppc, Feature Request, libgcrypt

May 15 2022

jukivili committed rCfd02e8e78470: aarch64-asm: use ADR for getting pointers for local labels (authored by jukivili).
aarch64-asm: use ADR for getting pointers for local labels
May 15 2022, 10:27 AM

May 11 2022

jukivili committed rC9ab61ba24b72: camellia: add amd64 GFNI/AVX512 implementation (authored by jukivili).
camellia: add amd64 GFNI/AVX512 implementation
May 11 2022, 7:37 PM
jukivili committed rCa9700956361d: cipher: move CBC/CFB/CTR self-tests to tests/basic (authored by jukivili).
cipher: move CBC/CFB/CTR self-tests to tests/basic
May 11 2022, 7:37 PM

May 9 2022

jukivili created T5970: gcry_mpi_invm producing wrong result.
May 9 2022, 8:30 PM · backport, libgcrypt, Bug Report
jukivili updated the task description for T4460: libgcrypt performance TODOs.
May 9 2022, 8:19 PM · libgcrypt

Apr 30 2022

jukivili committed rC9ba1f0091ff4: tests/basic: add testing for partial bulk processing code paths (authored by jukivili).
tests/basic: add testing for partial bulk processing code paths
Apr 30 2022, 12:37 PM
jukivili committed rCaad3381e9384: sm4: add XTS bulk processing (authored by jukivili).
sm4: add XTS bulk processing
Apr 30 2022, 12:37 PM
jukivili committed rCe239738b4af2: sm4-aesni-avx2: add generic 1 to 16 block bulk processing function (authored by jukivili).
sm4-aesni-avx2: add generic 1 to 16 block bulk processing function
Apr 30 2022, 12:37 PM
jukivili committed rC32b18cdb87b7: camellia-avx2: add bulk processing for XTS mode (authored by jukivili).
camellia-avx2: add bulk processing for XTS mode
Apr 30 2022, 12:37 PM
jukivili committed rC5095d60af42d: Add SM4 x86-64/GFNI/AVX2 implementation (authored by jukivili).
Add SM4 x86-64/GFNI/AVX2 implementation
Apr 30 2022, 12:37 PM
jukivili committed rCe1c5f950838b: sm4: deduplicate bulk processing function selection (authored by jukivili).
sm4: deduplicate bulk processing function selection
Apr 30 2022, 12:37 PM
jukivili committed rC9388279803ff: Move bulk OCB L pointer array setup code to common header (authored by jukivili).
Move bulk OCB L pointer array setup code to common header
Apr 30 2022, 12:37 PM
jukivili committed rC754055ccd043: cipher/bulkhelp: add functions for CTR/CBC/CFB/OCB bulk processing (authored by jukivili).
cipher/bulkhelp: add functions for CTR/CBC/CFB/OCB bulk processing
Apr 30 2022, 12:37 PM
jukivili committed rCbacdc1de3f4f: camellia-avx2: add partial parallel block processing (authored by jukivili).
camellia-avx2: add partial parallel block processing
Apr 30 2022, 12:37 PM
jukivili committed rC3410d40996d8: Add detection for HW feature "intel-gfni" (authored by jukivili).
Add detection for HW feature "intel-gfni"
Apr 30 2022, 12:37 PM
jukivili committed rC4e6896eb9fce: Add GFNI/AVX2 implementation of Camellia (authored by jukivili).
Add GFNI/AVX2 implementation of Camellia
Apr 30 2022, 12:37 PM

Apr 19 2022

jukivili closed T5913: libgcrypt: bug fix for PPC bulk AES-GCM acceleratieration, missing HWF_PPC_ARCH_3_10 in HW feature as Resolved.
Apr 19 2022, 5:59 PM · ppc, libgcrypt

Apr 6 2022

jukivili committed rCa7c3e0b9b0ff: doc: Fix missing ARM hardware features (authored by Tianjia Zhang <tianjia.zhang@linux.alibaba.com>).
doc: Fix missing ARM hardware features
Apr 6 2022, 9:34 PM
jukivili committed rC9a63cfd61753: chacha20: add AVX512 implementation (authored by jukivili).
chacha20: add AVX512 implementation
Apr 6 2022, 9:34 PM
jukivili committed rC972aae9fc337: build: Fix for arm crypto support (authored by Tianjia Zhang <tianjia.zhang@linux.alibaba.com>).
build: Fix for arm crypto support
Apr 6 2022, 9:34 PM
jukivili committed rCcd3ed4977076: poly1305: add AVX512 implementation (authored by jukivili).
poly1305: add AVX512 implementation
Apr 6 2022, 9:34 PM

Apr 4 2022

jukivili committed rCfe891ff4a3cd: Add SM3 ARMv8/AArch64/CE assembly implementation (authored by Tianjia Zhang <tianjia.zhang@linux.alibaba.com>).
Add SM3 ARMv8/AArch64/CE assembly implementation
Apr 4 2022, 6:12 PM

Apr 1 2022

jukivili added a comment to T5913: libgcrypt: bug fix for PPC bulk AES-GCM acceleratieration, missing HWF_PPC_ARCH_3_10 in HW feature.

Fixed in master. I rechecked that bulk implementation passes tests with qemu-ppc64le.

Apr 1 2022, 8:55 AM · ppc, libgcrypt
jukivili committed rC29bfb3ebbc63: hwf-ppc: fix missing HWF_PPC_ARCH_3_10 in HW feature (authored by jukivili).
hwf-ppc: fix missing HWF_PPC_ARCH_3_10 in HW feature
Apr 1 2022, 8:54 AM
jukivili added a comment to T5913: libgcrypt: bug fix for PPC bulk AES-GCM acceleratieration, missing HWF_PPC_ARCH_3_10 in HW feature.

Looks like that line went missing in third/final version of AES-GCM patch at https://dev.gnupg.org/T5700

Apr 1 2022, 8:51 AM · ppc, libgcrypt

Mar 29 2022

jukivili committed rCa5d126c61cc0: configure: fix avx512 check for i386 (authored by jukivili).
configure: fix avx512 check for i386
Mar 29 2022, 6:00 PM
jukivili committed rC4dc707e336a9: Fix configure.ac error of intel-avx512 (authored by Tianjia Zhang <tianjia.zhang@linux.alibaba.com>).
Fix configure.ac error of intel-avx512
Mar 29 2022, 6:00 PM

Mar 12 2022

jukivili updated the task description for T4460: libgcrypt performance TODOs.
Mar 12 2022, 9:40 AM · libgcrypt
jukivili updated the task description for T4460: libgcrypt performance TODOs.
Mar 12 2022, 9:39 AM · libgcrypt
jukivili closed T5828: Improvements for gnupg data operation performance (enc/dec/sign/verify/enarmor/dearmor/etc) as Resolved.
Mar 12 2022, 9:38 AM · gnupg
jukivili closed T5860: Reducing memory copy overhead in iobuf and estream to increase OCB speed as Resolved.
Mar 12 2022, 9:38 AM · gnupg
jukivili closed T5860: Reducing memory copy overhead in iobuf and estream to increase OCB speed, a subtask of T5828: Improvements for gnupg data operation performance (enc/dec/sign/verify/enarmor/dearmor/etc), as Resolved.
Mar 12 2022, 9:38 AM · gnupg
jukivili committed rCa0db0a121571: Fix building sha512-avx512 with clang (authored by jukivili).
Fix building sha512-avx512 with clang
Mar 12 2022, 9:34 AM

Mar 11 2022

jukivili committed rC089223aa3b55: SHA512: Add AVX512 implementation (authored by jukivili).
SHA512: Add AVX512 implementation
Mar 11 2022, 4:34 PM

Mar 9 2022

jukivili closed T5875: libgcrypt: VAES/AVX2 AES-OCB encryption performance issue with Intel CPUs, sudden drop in throughput with larger input sizes as Resolved.
Mar 9 2022, 7:47 PM · libgcrypt
jukivili added a comment to T5875: libgcrypt: VAES/AVX2 AES-OCB encryption performance issue with Intel CPUs, sudden drop in throughput with larger input sizes.

Fix pushed to master. Updated graph:

Mar 9 2022, 7:47 PM · libgcrypt
jukivili committed rCd820d27a3bce: rijndael-vaes-avx2: perform checksumming inline (authored by jukivili).
rijndael-vaes-avx2: perform checksumming inline
Mar 9 2022, 7:46 PM
jukivili triaged T5875: libgcrypt: VAES/AVX2 AES-OCB encryption performance issue with Intel CPUs, sudden drop in throughput with larger input sizes as Normal priority.
Mar 9 2022, 4:42 PM · libgcrypt

Mar 8 2022

jukivili committed rG15df88d135ba: iobuf: add zerocopy optimization for iobuf_read (authored by jukivili).
iobuf: add zerocopy optimization for iobuf_read
Mar 8 2022, 7:05 PM
jukivili committed rG49c6e5839452: gpg: fix --enarmor with zero length source file (authored by jukivili).
gpg: fix --enarmor with zero length source file
Mar 8 2022, 7:05 PM
jukivili committed rGb96eb6f08d1d: iobuf: add zerocopy optimization for iobuf_write (authored by jukivili).
iobuf: add zerocopy optimization for iobuf_write
Mar 8 2022, 7:05 PM
jukivili committed rG99e2c178c73c: g10/cipher-aead: add fast path for avoid memcpy when AEAD encrypting (authored by jukivili).
g10/cipher-aead: add fast path for avoid memcpy when AEAD encrypting
Mar 8 2022, 7:05 PM
jukivili committed rG583b664a07b4: g10/plaintext: disable estream buffering in binary mode (authored by jukivili).
g10/plaintext: disable estream buffering in binary mode
Mar 8 2022, 7:05 PM
jukivili committed rGf2322ff942fa: Use iobuf buffer size for temporary buffer size (authored by jukivili).
Use iobuf buffer size for temporary buffer size
Mar 8 2022, 7:05 PM
jukivili committed rG6c95d52a22a7: g10/decrypt-data: disable output estream buffering to reduce overhead (authored by jukivili).
g10/decrypt-data: disable output estream buffering to reduce overhead
Mar 8 2022, 7:05 PM
jukivili committed rCd857e85cb4d4: ghash|polyval: add x86_64 VPCLMUL/AVX2 accelerated implementation (authored by jukivili).
ghash|polyval: add x86_64 VPCLMUL/AVX2 accelerated implementation
Mar 8 2022, 6:16 PM
jukivili committed rCe6f360019369: ghash|polyval: add x86_64 VPCLMUL/AVX512 accelerated implementation (authored by jukivili).
ghash|polyval: add x86_64 VPCLMUL/AVX512 accelerated implementation
Mar 8 2022, 6:16 PM
jukivili committed rC8cf06145263e: Add detection for HW feature "intel-avx512" (authored by jukivili).
Add detection for HW feature "intel-avx512"
Mar 8 2022, 6:16 PM

Mar 7 2022

jukivili added a comment to T5870: libgcrypt: AEAD API for FIPS 140 (in future).

Is large change to cipher API really needed (new open/encrypt with less flexibility)? How that would affect performance? Would following new interfaces to gcry_cipher API work instead?

  • gcry_cipher_setup_geniv(hd, int ivlen, int method): for setting up IV generator with parameters such as IV length, method id (RFC5116, TLS 1.3, SSH, etc), (other parameters?)
  • gcry_cipher_geniv(hd, byte *outiv): for generating new iv: generate IV using select method, set IV internally and output generated IV to 'ivout'.
  • gcry_cipher_genkey(hd, byte *outkey, int keylen, int method): for generating keys, generate key internally with parameters (method id, other?), setup key internally and output generated key to 'outkey'. (how keys from key exchange protocol be handled? using existing setkey?)
Mar 7 2022, 9:04 PM · Feature Request, FIPS, libgcrypt
jukivili added a comment to T5860: Reducing memory copy overhead in iobuf and estream to increase OCB speed.

I went through my test files and found that --enarmor on zero length input file did no longer work. I made separate patch to fix that issue, which then also needs another approach for handling compress issue noticed earlier:

Mar 7 2022, 8:09 PM · gnupg

Mar 6 2022

jukivili updated subscribers of T5860: Reducing memory copy overhead in iobuf and estream to increase OCB speed.

Does this look ok to push to master? @werner @gniibe

Mar 6 2022, 6:59 PM · gnupg
jukivili updated the task description for T4460: libgcrypt performance TODOs.
Mar 6 2022, 6:35 PM · libgcrypt

Mar 5 2022

jukivili updated the task description for T4460: libgcrypt performance TODOs.
Mar 5 2022, 2:09 PM · libgcrypt
jukivili updated the task description for T4460: libgcrypt performance TODOs.
Mar 5 2022, 1:23 PM · libgcrypt
jukivili updated the task description for T4460: libgcrypt performance TODOs.
Mar 5 2022, 1:21 PM · libgcrypt

Mar 3 2022

jukivili added a comment to T5860: Reducing memory copy overhead in iobuf and estream to increase OCB speed.

New versions of patches 0005 and 0006 - fixes EOF handling issues noticed with compression/decompression:

Mar 3 2022, 9:17 PM · gnupg

Mar 2 2022

jukivili updated the task description for T4460: libgcrypt performance TODOs.
Mar 2 2022, 8:39 PM · libgcrypt
jukivili updated the task description for T4460: libgcrypt performance TODOs.
Mar 2 2022, 8:37 PM · libgcrypt
jukivili closed T5700: libgcrypt: bulk AES-GCM acceleration for ppc64le as Resolved.
Mar 2 2022, 8:24 PM · patch, ppc, libgcrypt, Feature Request
jukivili committed rC7d2983979866: hwf-arm: add ARMv8.2 optional crypto extension HW features (authored by Tianjia Zhang <tianjia.zhang@linux.alibaba.com>).
hwf-arm: add ARMv8.2 optional crypto extension HW features
Mar 2 2022, 8:23 PM
jukivili committed rC47cafffb09d8: Add SM4 ARMv8/AArch64/CE assembly implementation (authored by Tianjia Zhang <tianjia.zhang@linux.alibaba.com>).
Add SM4 ARMv8/AArch64/CE assembly implementation
Mar 2 2022, 8:23 PM
jukivili closed T5852: Use iobuf_copy where instead of manual iobuf_get/iobuf_put or iobuf_read/iobuf_write loops, a subtask of T5828: Improvements for gnupg data operation performance (enc/dec/sign/verify/enarmor/dearmor/etc), as Resolved.
Mar 2 2022, 8:17 PM · gnupg
jukivili closed T5852: Use iobuf_copy where instead of manual iobuf_get/iobuf_put or iobuf_read/iobuf_write loops as Resolved.
Mar 2 2022, 8:17 PM · gnupg (gpg23)
jukivili committed rG9c313321a849: g10/dearmor: use iobuf_copy (authored by jukivili).
g10/dearmor: use iobuf_copy
Mar 2 2022, 8:15 PM
jukivili committed rG756c0bd5d89b: g10/encrypt: use iobuf_copy instead of manual iobuf_read/iobuf_write (authored by jukivili).
g10/encrypt: use iobuf_copy instead of manual iobuf_read/iobuf_write
Mar 2 2022, 8:15 PM

Feb 27 2022

jukivili triaged T5860: Reducing memory copy overhead in iobuf and estream to increase OCB speed as Low priority.
Feb 27 2022, 7:12 PM · gnupg
jukivili updated subscribers of T5852: Use iobuf_copy where instead of manual iobuf_get/iobuf_put or iobuf_read/iobuf_write loops.

Does these patches look ok? @gniibe @werner

Feb 27 2022, 5:55 PM · gnupg (gpg23)
jukivili closed T5826: Improve detached signing and verification speed, a subtask of T5828: Improvements for gnupg data operation performance (enc/dec/sign/verify/enarmor/dearmor/etc), as Resolved.
Feb 27 2022, 5:54 PM · gnupg
jukivili closed T5826: Improve detached signing and verification speed as Resolved.
Feb 27 2022, 5:54 PM · gnupg
jukivili committed rG4e27b9defc60: g10/plaintext: do_hash: use iobuf_read for higher performance (authored by jukivili).
g10/plaintext: do_hash: use iobuf_read for higher performance
Feb 27 2022, 5:52 PM
jukivili committed rGf8943ce098f6: g10/sign: sign_file: use iobuf_read for higher detached signing speed (authored by jukivili).
g10/sign: sign_file: use iobuf_read for higher detached signing speed
Feb 27 2022, 5:52 PM

Feb 25 2022

jukivili added a comment to T5826: Improve detached signing and verification speed.

I used "1<<30" by example of existing code in g10/free-packet.c, which is another place where iobuf_read is reading to NULL.

Feb 25 2022, 7:27 AM · gnupg

Feb 24 2022

jukivili closed T5785: libgcrypt-1.9.4 build failure on ppc64le as Resolved.
Feb 24 2022, 6:53 PM · Gentoo, Bug Report
jukivili added a comment to T5785: libgcrypt-1.9.4 build failure on ppc64le.

(note: -O2 is added only for compiling powerpc vector implementation files)

Feb 24 2022, 6:53 PM · Gentoo, Bug Report
jukivili added a comment to T5785: libgcrypt-1.9.4 build failure on ppc64le.

I added check to configure.ac for missing -O flag and tests with -O2. If adding -O2 does not help, then powerpc vector implementations wont be build at all.

Feb 24 2022, 6:53 PM · Gentoo, Bug Report
jukivili committed rC6951e0f591cc: powerpc: check for missing optimization level for vector register usage (authored by jukivili).
powerpc: check for missing optimization level for vector register usage
Feb 24 2022, 6:39 PM
jukivili closed T4486: Add AEAD mode AES-SIV to libgcrypt (RFC 5297) as Resolved.
Feb 24 2022, 6:06 PM · Feature Request, libgcrypt
jukivili closed T5356: gnupg2 test failure on s390x as Resolved.
Feb 24 2022, 6:05 PM · libgcrypt, Bug Report
jukivili closed T5694: poly1305-s390x.S is compiled despite --disable-asm as Resolved.
Feb 24 2022, 6:05 PM · libgcrypt, Bug Report
jukivili closed T5796: libgcrypt-1.9.4 build failure on ARM without NEON as Resolved.
Feb 24 2022, 6:05 PM · arm, libgcrypt, Gentoo, Bug Report
jukivili updated subscribers of T5826: Improve detached signing and verification speed.

Does the patches look ok to push to master? @werner @gniibe

Feb 24 2022, 6:04 PM · gnupg
jukivili added a comment to T5853: Decrypting OCB encrypted file fails....

Thanks. All my tests work now.

Feb 24 2022, 6:01 PM · gnupg (gpg23), Bug Report

Feb 23 2022

jukivili committed rCd8825601f10a: Add SM4 ARMv8/AArch64 assembly implementation (authored by Tianjia Zhang <tianjia.zhang@linux.alibaba.com>).
Add SM4 ARMv8/AArch64 assembly implementation
Feb 23 2022, 6:24 PM
jukivili committed rC83e1649edd5e: Move VPUSH_API/VPOP_API macros to common header (authored by Tianjia Zhang <tianjia.zhang@linux.alibaba.com>).
Move VPUSH_API/VPOP_API macros to common header
Feb 23 2022, 6:23 PM