Patch applied to master with small changes.
Jun 12 2022
Jun 3 2022
Thanks @jukivili , Here is the changelog,
Thanks for updated patch. I'm travelling next week and have time to check it closely only after I'm back. On quick glance, it looks good. What is also needed is the changelog for git commit log.
Jun 2 2022
Thanks @jukivili. I have never thought of interleaving with interger poly1305 operation and that's a good suggestion. Will think about that one.
Jun 1 2022
I meant interleaving integer register based 1xPoly1305 with 8xChacha20 as is done for 4xChacha20 in cipher/chacha20-ppc.c (interleaved so that for each 4xChaCha20 processed, 4 blocks of 1xPoly1305 is executed). Quite often microarchitectures have separate execution units for integer registers and vector registers and then it makes sense to interleave integer-poly1305 with vector-chacha20 as algorithms do not end up competing for same execution resources. Interleaving vector-poly1305 and vector-chacha20 is not likely to give performance increase (and likely to run problems with running out of vector registers).
HI @jukivili , Thanks for the updates. For f14-f31 registers that was my mistake that did not think floating point will be used. Will correct that. For poly1305, it can be used on ARCH_3.0 so checking use_p10 doesn't seem to be necessary but I can include that as well.
May 28 2022
Problem is that new assembly is using VSX registers vs14-vs31 which overlap with floating-point registers f14-f31. f14-f31 are ABI callee saved, so those need to be stored and restored.
Tested patch with small change so that HWF_PPC_ARCH_3_00 is used instead of HWF_PPC_ARCH_3_10. Building bench-slope with "-O3 -flto" makes bug in new implementation visible. Without new implementations bench-slope is ok (testing with QEMU):
$ tests/bench-slope --disable-hwf ppc-arch_3_00 cipher chacha20 Cipher: CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte STREAM enc | 2.35 ns/B 405.0 MiB/s - c/B STREAM dec | 2.32 ns/B 410.7 MiB/s - c/B POLY1305 enc | 2.46 ns/B 388.0 MiB/s - c/B POLY1305 dec | 2.34 ns/B 408.1 MiB/s - c/B POLY1305 auth | 0.238 ns/B 4003 MiB/s - c/B
May 27 2022
-O2 problem with bench-slope seems strange. Does problem appear after this patch is applied?
May 26 2022
May 17 2022
I do not claim I understand anything of this assembler syntax :)
For the second, I wonder if newer xlclang++ compiler works with 1.9.
Thank you for the bug report.
May 16 2022
Apr 19 2022
Apr 1 2022
Hi Jussi, yes for some reason, it went missing, I was checking performance numbers and found out the line went missing. Thanks.
Fixed in master. I rechecked that bulk implementation passes tests with qemu-ppc64le.
Looks like that line went missing in third/final version of AES-GCM patch at https://dev.gnupg.org/T5700
Mar 31 2022
Mar 2 2022
Dec 21 2021
Ok, I'll add.
Seen. @jukivili can you please add it to the AUTHORS file?
Dec 14 2021
Ok, I have subscribed to the mailing list. I have resent the DCO.
DCO has not appeared on mailing-list. You can this from check list archives, https://lists.gnupg.org/pipermail/gcrypt-devel/2021-December/thread.html
Thanks Jussi, I did not receive the list moderator's email so I am not sure if the it has been posted on firstname.lastname@example.org. If not, I can resend the DCO. Thanks.
I did some finishing touches on coding style:
Dec 13 2021
Dec 12 2021
Few comments on new patch:
Dec 10 2021
Dec 7 2021
I ran some basic tests and it did show the errors. I am in the process investigating what went wrong. In the meantime, i also included test result that I have used in my testing from bench-slope. In this test, I captured the message with 272 bytes buffer from the original libgcrypt repo and my optimized repo. Note that the bulk version of my code do 8x unrolling and the rest will do 16 bytes. So the first 2 128 bytes ran thru gcry_ppc_aes_gcm_encrypt and the rest of the 16 bytes thru gcm_ctr_encrypt (cipher-gcm.c).
Dec 6 2021
Thanks jukivili for the review.
Dec 4 2021
Thanks, however I didn't see your email on mailing-list. Maybe the email got stuck on the way.
Dec 2 2021
I sent a copy to email@example.com. Hope this is the right process. Thanks.
Please read doc/HACKING carefully on the process of sending DCO the right way.
Nov 23 2021
Hi Werner, Here is the DCO. Thanks.
FWIW: We need a DCO; see doc/HACKING.