Small tweak for PowerPC Chacha20-Poly1305 round loop

Authored by jukivili on Sep 19 2019, 9:25 PM.

Description

Small tweak for PowerPC Chacha20-Poly1305 round loop

* cipher/chacha20-ppc.c (_gcry_chacha20_poly1305_ppc8_block4): Use
inner/outer round loop structure instead of two separate loops for
stitched and non-stitched parts.

Benchmark on POWER8 ~3.8Ghz:

Before:
CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte

   STREAM enc |     0.619 ns/B      1541 MiB/s      2.35 c/B
   STREAM dec |     0.619 ns/B      1541 MiB/s      2.35 c/B
 POLY1305 enc |     0.784 ns/B      1216 MiB/s      2.98 c/B
 POLY1305 dec |     0.770 ns/B      1239 MiB/s      2.93 c/B
POLY1305 auth |     0.502 ns/B      1898 MiB/s      1.91 c/B

After (~2% faster):
CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte

POLY1305 enc |     0.765 ns/B      1247 MiB/s      2.91 c/B
POLY1305 dec |     0.749 ns/B      1273 MiB/s      2.85 c/B

Benchmark on POWER9 ~3.8Ghz:

Before:
CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte

   STREAM enc |     0.687 ns/B      1389 MiB/s      2.61 c/B
   STREAM dec |     0.692 ns/B      1379 MiB/s      2.63 c/B
 POLY1305 enc |      1.08 ns/B     880.9 MiB/s      4.11 c/B
 POLY1305 dec |      1.07 ns/B     888.0 MiB/s      4.08 c/B
POLY1305 auth |     0.459 ns/B      2078 MiB/s      1.74 c/B

After (~5% faster):
CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte

POLY1305 enc |      1.03 ns/B     929.2 MiB/s      3.90 c/B
POLY1305 dec |      1.02 ns/B     936.6 MiB/s      3.87 c/B
  • Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>

Details

Committed
jukiviliSep 22 2019, 6:52 PM
Parents
rC664370ea02df: Reduce size of x86-64 stitched Chacha20-Poly1305 implementations
Branches
Unknown
Tags
Unknown