Home GnuPG

Add s390x/zSeries implementation of Poly1305

Description

Add s390x/zSeries implementation of Poly1305

* cipher/Makefile.am: Add 'poly1305-s390x.S' and
'asm-poly1305-s390x.h'.
* cipher/asm-poly1305-s390x.h: New
* cipher/chacha20-s390x.S (_gcry_chacha20_poly1305_s390x_vx_blocks8)
(_gcry_chacha20_poly1305_s390x_vx_blocks4_2_1): New, stitched
chacha20-poly1305 implementation.
* cipher/chacha20.c (USE_S390X_VX_POLY1305): New.
(_gcry_chacha20_poly1305_s390x_vx_blocks8)
(_gcry_chacha20_poly1305_s390x_vx_blocks4_2_1): New prototypes.
(_gcry_chacha20_poly1305_encrypt, _gcry_chacha20_poly1305_decrypt): Add
s390x/VX stitched chacha20-poly1305 code-path.
* cipher/poly1305-s390x.S: New.
* cipher/poly1305.c (USE_S390X_ASM, HAVE_ASM_POLY1305_BLOCKS): New.
[USE_S390X_ASM] (_gcry_poly1305_s390x_blocks1, poly1305_blocks): New.
* configure.ac (gcry_cv_gcc_inline_asm_s390x): Check for 'risbgn' and
'algrk' instructions.
* tests/basic.c (_check_poly1305_cipher): Add large chacha20-poly1305
test vector.

Patch adds Poly1305 and stitched ChaCha20-Poly1305 implementation
for zSeries. Stitched implementation interleaves ChaCha20 and Poly1305
processing for higher instruction level parallelism and better
utilization of execution units.

Benchmark on z15 (4504 Mhz):

Before:
CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte

 POLY1305 enc |      1.16 ns/B     823.2 MiB/s      5.22 c/B
 POLY1305 dec |      1.16 ns/B     823.2 MiB/s      5.22 c/B
POLY1305 auth |     0.736 ns/B      1295 MiB/s      3.32 c/B

After (chacha20-poly1305 ~71% faster, poly1305 ~29% faster):
CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte

 POLY1305 enc |     0.677 ns/B      1409 MiB/s      3.05 c/B
 POLY1305 dec |     0.655 ns/B      1456 MiB/s      2.95 c/B
POLY1305 auth |     0.569 ns/B      1675 MiB/s      2.56 c/B
  • GnuPG-bug-id: T5202
  • Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>