Add PowerPC vector implementation of ChaCha20

Authored by jukivili on Sep 7 2019, 12:48 AM.

Description

Add PowerPC vector implementation of ChaCha20

* cipher/Makefile.am: Add 'chacha20-ppc.c'.
* cipher/chacha20-ppc.c: New.
* cipher/chacha20.c (USE_PPC_VEC, _gcry_chacha20_ppc8_blocks4)
(_gcry_chacha20_ppc8_blocks1, USE_PPC_VEC_POLY1305)
(_gcry_chacha20_poly1305_ppc8_blocks4): New.
(CHACHA20_context_t): Add 'use_ppc'.
(chacha20_blocks, chacha20_keysetup)
(do_chacha20_encrypt_stream_tail): Add USE_PPC_VEC code.
(_gcry_chacha20_poly1305_encrypt, _gcry_chacha20_poly1305_decrypt): Add
USE_PPC_VEC_POLY1305 code.
* configure.ac: Add 'chacha20-ppc.lo'.
* src/g10lib.h (HWF_PPC_ARCH_2_07): New.
* src/hwf-ppc.c (PPC_FEATURE2_ARCH_2_07): New.
(ppc_features): Add HWF_PPC_ARCH_2_07.
* src/hwfeatures.c (hwflist): Add 'ppc-arch_2_07'.

This patch adds 1-way, 2-way and 4-way ChaCha20 vector implementations
and 4-way stitched ChaCha20+Poly1305 implementation for PowerPC.

Benchmark on POWER8 (ppc64le, ~3.8Ghz):

Before:
CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte

   STREAM enc |      2.60 ns/B     366.2 MiB/s      9.90 c/B
   STREAM dec |      2.61 ns/B     366.1 MiB/s      9.90 c/B
 POLY1305 enc |      3.11 ns/B     307.1 MiB/s     11.80 c/B
 POLY1305 dec |      3.11 ns/B     307.0 MiB/s     11.80 c/B
POLY1305 auth |     0.502 ns/B      1900 MiB/s      1.91 c/B

After (~4x faster):
CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte

   STREAM enc |     0.619 ns/B      1540 MiB/s      2.35 c/B
   STREAM dec |     0.619 ns/B      1541 MiB/s      2.35 c/B
 POLY1305 enc |     0.785 ns/B      1215 MiB/s      2.98 c/B
 POLY1305 dec |     0.769 ns/B      1240 MiB/s      2.92 c/B
POLY1305 auth |     0.502 ns/B      1901 MiB/s      1.91 c/B

Benchmark on POWER9 (ppc64le, ~3.8Ghz):

Before:
CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte

   STREAM enc |      2.27 ns/B     419.9 MiB/s      8.63 c/B
   STREAM dec |      2.27 ns/B     419.8 MiB/s      8.63 c/B
 POLY1305 enc |      2.73 ns/B     349.1 MiB/s     10.38 c/B
 POLY1305 dec |      2.73 ns/B     349.3 MiB/s     10.37 c/B
POLY1305 auth |     0.459 ns/B      2076 MiB/s      1.75 c/B

After (chacha20 ~3x faster, chacha20+poly1305 ~2.5x faster):
CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte

   STREAM enc |     0.690 ns/B      1381 MiB/s      2.62 c/B
   STREAM dec |     0.690 ns/B      1382 MiB/s      2.62 c/B
 POLY1305 enc |      1.09 ns/B     878.2 MiB/s      4.13 c/B
 POLY1305 dec |      1.07 ns/B     887.8 MiB/s      4.08 c/B
POLY1305 auth |     0.459 ns/B      2076 MiB/s      1.75 c/B
  • GnuPG-bug-id: T4460
  • Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>

Details

Committed
jukiviliSep 15 2019, 9:52 PM
Parents
rC0564757b934d: poly1305: add fast addition macro for ppc64
Branches
Unknown
Tags
Unknown
Tasks
T4460: libgcrypt performance TODOs