Thanks for updated patch. I'm travelling next week and have time to check it closely only after I'm back. On quick glance, it looks good. What is also needed is the changelog for git commit log.

Jun 3 2022, 10:30 AM · patch, ppc, Feature Request, libgcrypt

Jun 1 2022

jukivili added a comment to T6006: Optimize Chacha20 and Poly1305 for PPC P10 LE.

I meant interleaving integer register based 1xPoly1305 with 8xChacha20 as is done for 4xChacha20 in cipher/chacha20-ppc.c (interleaved so that for each 4xChaCha20 processed, 4 blocks of 1xPoly1305 is executed). Quite often microarchitectures have separate execution units for integer registers and vector registers and then it makes sense to interleave integer-poly1305 with vector-chacha20 as algorithms do not end up competing for same execution resources. Interleaving vector-poly1305 and vector-chacha20 is not likely to give performance increase (and likely to run problems with running out of vector registers).

Jun 1 2022, 5:37 PM · patch, ppc, Feature Request, libgcrypt

May 28 2022

jukivili added a comment to T6006: Optimize Chacha20 and Poly1305 for PPC P10 LE.

Problem is that new assembly is using VSX registers vs14-vs31 which overlap with floating-point registers f14-f31. f14-f31 are ABI callee saved, so those need to be stored and restored.

May 28 2022, 9:04 PM · patch, ppc, Feature Request, libgcrypt

jukivili added a comment to T6006: Optimize Chacha20 and Poly1305 for PPC P10 LE.

Tested patch with small change so that HWF_PPC_ARCH_3_00 is used instead of HWF_PPC_ARCH_3_10. Building bench-slope with "-O3 -flto" makes bug in new implementation visible. Without new implementations bench-slope is ok (testing with QEMU):

$ tests/bench-slope --disable-hwf ppc-arch_3_00 cipher chacha20
Cipher:
 CHACHA20       |  nanosecs/byte   mebibytes/sec   cycles/byte
     STREAM enc |      2.35 ns/B     405.0 MiB/s         - c/B
     STREAM dec |      2.32 ns/B     410.7 MiB/s         - c/B
   POLY1305 enc |      2.46 ns/B     388.0 MiB/s         - c/B
   POLY1305 dec |      2.34 ns/B     408.1 MiB/s         - c/B
  POLY1305 auth |     0.238 ns/B      4003 MiB/s         - c/B

May 28 2022, 6:49 PM · patch, ppc, Feature Request, libgcrypt

May 27 2022

jukivili added a comment to T6006: Optimize Chacha20 and Poly1305 for PPC P10 LE.

-O2 problem with bench-slope seems strange. Does problem appear after this patch is applied?

May 27 2022, 7:15 PM · patch, ppc, Feature Request, libgcrypt

May 15 2022

jukivili committed rCfd02e8e78470: aarch64-asm: use ADR for getting pointers for local labels (authored by jukivili).

aarch64-asm: use ADR for getting pointers for local labels

May 15 2022, 10:27 AM

May 11 2022

jukivili committed rC9ab61ba24b72: camellia: add amd64 GFNI/AVX512 implementation (authored by jukivili).

camellia: add amd64 GFNI/AVX512 implementation

May 11 2022, 7:37 PM

jukivili committed rCa9700956361d: cipher: move CBC/CFB/CTR self-tests to tests/basic (authored by jukivili).

cipher: move CBC/CFB/CTR self-tests to tests/basic

May 11 2022, 7:37 PM

May 9 2022

jukivili created T5970: gcry_mpi_invm producing wrong result.

May 9 2022, 8:30 PM · backport, libgcrypt, Bug Report

jukivili updated the task description for T4460: libgcrypt performance TODOs.

May 9 2022, 8:19 PM · libgcrypt

Apr 30 2022

jukivili committed rC9ba1f0091ff4: tests/basic: add testing for partial bulk processing code paths (authored by jukivili).

tests/basic: add testing for partial bulk processing code paths

Apr 30 2022, 12:37 PM

jukivili committed rCaad3381e9384: sm4: add XTS bulk processing (authored by jukivili).

sm4: add XTS bulk processing

Apr 30 2022, 12:37 PM

jukivili committed rCe239738b4af2: sm4-aesni-avx2: add generic 1 to 16 block bulk processing function (authored by jukivili).

sm4-aesni-avx2: add generic 1 to 16 block bulk processing function

Apr 30 2022, 12:37 PM

jukivili committed rC32b18cdb87b7: camellia-avx2: add bulk processing for XTS mode (authored by jukivili).

camellia-avx2: add bulk processing for XTS mode

Apr 30 2022, 12:37 PM

jukivili committed rC5095d60af42d: Add SM4 x86-64/GFNI/AVX2 implementation (authored by jukivili).

Add SM4 x86-64/GFNI/AVX2 implementation

Apr 30 2022, 12:37 PM

jukivili committed rCe1c5f950838b: sm4: deduplicate bulk processing function selection (authored by jukivili).

sm4: deduplicate bulk processing function selection

Apr 30 2022, 12:37 PM

jukivili committed rC9388279803ff: Move bulk OCB L pointer array setup code to common header (authored by jukivili).

Move bulk OCB L pointer array setup code to common header

Apr 30 2022, 12:37 PM

jukivili committed rC754055ccd043: cipher/bulkhelp: add functions for CTR/CBC/CFB/OCB bulk processing (authored by jukivili).

cipher/bulkhelp: add functions for CTR/CBC/CFB/OCB bulk processing

Apr 30 2022, 12:37 PM

jukivili committed rCbacdc1de3f4f: camellia-avx2: add partial parallel block processing (authored by jukivili).

camellia-avx2: add partial parallel block processing

Apr 30 2022, 12:37 PM

jukivili committed rC3410d40996d8: Add detection for HW feature "intel-gfni" (authored by jukivili).

Add detection for HW feature "intel-gfni"

Apr 30 2022, 12:37 PM

jukivili committed rC4e6896eb9fce: Add GFNI/AVX2 implementation of Camellia (authored by jukivili).

Add GFNI/AVX2 implementation of Camellia

Apr 30 2022, 12:37 PM

Apr 19 2022

jukivili closed T5913: libgcrypt: bug fix for PPC bulk AES-GCM acceleratieration, missing HWF_PPC_ARCH_3_10 in HW feature as Resolved.

Apr 19 2022, 5:59 PM · ppc, libgcrypt

Apr 6 2022

jukivili committed rCa7c3e0b9b0ff: doc: Fix missing ARM hardware features (authored by Tianjia Zhang <tianjia.zhang@linux.alibaba.com>).

doc: Fix missing ARM hardware features

Apr 6 2022, 9:34 PM

jukivili committed rC9a63cfd61753: chacha20: add AVX512 implementation (authored by jukivili).

chacha20: add AVX512 implementation

Apr 6 2022, 9:34 PM

jukivili committed rC972aae9fc337: build: Fix for arm crypto support (authored by Tianjia Zhang <tianjia.zhang@linux.alibaba.com>).

build: Fix for arm crypto support

Apr 6 2022, 9:34 PM

jukivili committed rCcd3ed4977076: poly1305: add AVX512 implementation (authored by jukivili).

poly1305: add AVX512 implementation

Apr 6 2022, 9:34 PM

Apr 4 2022

jukivili committed rCfe891ff4a3cd: Add SM3 ARMv8/AArch64/CE assembly implementation (authored by Tianjia Zhang <tianjia.zhang@linux.alibaba.com>).

Add SM3 ARMv8/AArch64/CE assembly implementation

Apr 4 2022, 6:12 PM

Apr 1 2022

jukivili added a comment to T5913: libgcrypt: bug fix for PPC bulk AES-GCM acceleratieration, missing HWF_PPC_ARCH_3_10 in HW feature.

Fixed in master. I rechecked that bulk implementation passes tests with qemu-ppc64le.

Apr 1 2022, 8:55 AM · ppc, libgcrypt

jukivili committed rC29bfb3ebbc63: hwf-ppc: fix missing HWF_PPC_ARCH_3_10 in HW feature (authored by jukivili).

hwf-ppc: fix missing HWF_PPC_ARCH_3_10 in HW feature

Apr 1 2022, 8:54 AM

jukivili added a comment to T5913: libgcrypt: bug fix for PPC bulk AES-GCM acceleratieration, missing HWF_PPC_ARCH_3_10 in HW feature.

Looks like that line went missing in third/final version of AES-GCM patch at https://dev.gnupg.org/T5700

Apr 1 2022, 8:51 AM · ppc, libgcrypt

Mar 29 2022

jukivili committed rCa5d126c61cc0: configure: fix avx512 check for i386 (authored by jukivili).

configure: fix avx512 check for i386

Mar 29 2022, 6:00 PM

jukivili committed rC4dc707e336a9: Fix configure.ac error of intel-avx512 (authored by Tianjia Zhang <tianjia.zhang@linux.alibaba.com>).

Fix configure.ac error of intel-avx512

Mar 29 2022, 6:00 PM

Mar 12 2022

jukivili updated the task description for T4460: libgcrypt performance TODOs.

Mar 12 2022, 9:40 AM · libgcrypt

jukivili updated the task description for T4460: libgcrypt performance TODOs.

Mar 12 2022, 9:39 AM · libgcrypt

jukivili closed T5828: Improvements for gnupg data operation performance (enc/dec/sign/verify/enarmor/dearmor/etc) as Resolved.

Mar 12 2022, 9:38 AM · gnupg

jukivili closed T5860: Reducing memory copy overhead in iobuf and estream to increase OCB speed as Resolved.

Mar 12 2022, 9:38 AM · gnupg

jukivili closed T5860: Reducing memory copy overhead in iobuf and estream to increase OCB speed, a subtask of T5828: Improvements for gnupg data operation performance (enc/dec/sign/verify/enarmor/dearmor/etc), as Resolved.

Mar 12 2022, 9:38 AM · gnupg

jukivili committed rCa0db0a121571: Fix building sha512-avx512 with clang (authored by jukivili).

Fix building sha512-avx512 with clang

Mar 12 2022, 9:34 AM

Mar 11 2022

jukivili committed rC089223aa3b55: SHA512: Add AVX512 implementation (authored by jukivili).

SHA512: Add AVX512 implementation

Mar 11 2022, 4:34 PM

Mar 9 2022

jukivili closed T5875: libgcrypt: VAES/AVX2 AES-OCB encryption performance issue with Intel CPUs, sudden drop in throughput with larger input sizes as Resolved.

Mar 9 2022, 7:47 PM · libgcrypt

jukivili added a comment to T5875: libgcrypt: VAES/AVX2 AES-OCB encryption performance issue with Intel CPUs, sudden drop in throughput with larger input sizes.

Fix pushed to master. Updated graph:

Mar 9 2022, 7:47 PM · libgcrypt

jukivili committed rCd820d27a3bce: rijndael-vaes-avx2: perform checksumming inline (authored by jukivili).

rijndael-vaes-avx2: perform checksumming inline

Mar 9 2022, 7:46 PM

jukivili triaged T5875: libgcrypt: VAES/AVX2 AES-OCB encryption performance issue with Intel CPUs, sudden drop in throughput with larger input sizes as Normal priority.

Mar 9 2022, 4:42 PM · libgcrypt

Mar 8 2022

jukivili committed rG15df88d135ba: iobuf: add zerocopy optimization for iobuf_read (authored by jukivili).

iobuf: add zerocopy optimization for iobuf_read

Mar 8 2022, 7:05 PM

jukivili committed rG49c6e5839452: gpg: fix --enarmor with zero length source file (authored by jukivili).

gpg: fix --enarmor with zero length source file

Mar 8 2022, 7:05 PM

jukivili committed rGb96eb6f08d1d: iobuf: add zerocopy optimization for iobuf_write (authored by jukivili).

iobuf: add zerocopy optimization for iobuf_write

Mar 8 2022, 7:05 PM

jukivili committed rG99e2c178c73c: g10/cipher-aead: add fast path for avoid memcpy when AEAD encrypting (authored by jukivili).

g10/cipher-aead: add fast path for avoid memcpy when AEAD encrypting

Mar 8 2022, 7:05 PM

jukivili committed rG583b664a07b4: g10/plaintext: disable estream buffering in binary mode (authored by jukivili).

g10/plaintext: disable estream buffering in binary mode

Mar 8 2022, 7:05 PM

jukivili committed rGf2322ff942fa: Use iobuf buffer size for temporary buffer size (authored by jukivili).

Use iobuf buffer size for temporary buffer size

Mar 8 2022, 7:05 PM

jukivili committed rG6c95d52a22a7: g10/decrypt-data: disable output estream buffering to reduce overhead (authored by jukivili).

g10/decrypt-data: disable output estream buffering to reduce overhead

Mar 8 2022, 7:05 PM

jukivili committed rCd857e85cb4d4: ghash|polyval: add x86_64 VPCLMUL/AVX2 accelerated implementation (authored by jukivili).

ghash|polyval: add x86_64 VPCLMUL/AVX2 accelerated implementation

Mar 8 2022, 6:16 PM

jukivili committed rCe6f360019369: ghash|polyval: add x86_64 VPCLMUL/AVX512 accelerated implementation (authored by jukivili).

ghash|polyval: add x86_64 VPCLMUL/AVX512 accelerated implementation

Mar 8 2022, 6:16 PM

jukivili committed rC8cf06145263e: Add detection for HW feature "intel-avx512" (authored by jukivili).

Add detection for HW feature "intel-avx512"

Mar 8 2022, 6:16 PM

Mar 7 2022

jukivili added a comment to T5870: libgcrypt: AEAD API for FIPS 140 (in future).

Is large change to cipher API really needed (new open/encrypt with less flexibility)? How that would affect performance? Would following new interfaces to gcry_cipher API work instead?

gcry_cipher_setup_geniv(hd, int ivlen, int method): for setting up IV generator with parameters such as IV length, method id (RFC5116, TLS 1.3, SSH, etc), (other parameters?)
gcry_cipher_geniv(hd, byte *outiv): for generating new iv: generate IV using select method, set IV internally and output generated IV to 'ivout'.
gcry_cipher_genkey(hd, byte *outkey, int keylen, int method): for generating keys, generate key internally with parameters (method id, other?), setup key internally and output generated key to 'outkey'. (how keys from key exchange protocol be handled? using existing setkey?)

Mar 7 2022, 9:04 PM · Feature Request, FIPS, libgcrypt

jukivili added a comment to T5860: Reducing memory copy overhead in iobuf and estream to increase OCB speed.

I went through my test files and found that --enarmor on zero length input file did no longer work. I made separate patch to fix that issue, which then also needs another approach for handling compress issue noticed earlier:

0007-gpg-fix-enarmor-with-zero-length-source-file.patch2 KBDownload

Mar 7 2022, 8:09 PM · gnupg