s390x/zSeries implement faster ChaCha20-Poly1305 AEAD.
Description
Description
Revisions and Commits
Revisions and Commits
rC libgcrypt | |||
rC1f75681cbba8 Add s390x/zSeries implementation of Poly1305 |
Status | Assigned | Task | ||
---|---|---|---|---|
Open | jukivili | T4460 libgcrypt performance TODOs | ||
Resolved | jukivili | T5196 libgcrypt: s390x/zSeries performance improvements | ||
Resolved | jukivili | T5202 libgcrypt: s390x/zSeries implementation of Poly1305 / ChaCha20-Poly1305 AEAD |
Event Timeline
Comment Actions
Implemented stitched ChaCha20-Poly1305 (vector ChaCha20 & ALU Poly1305). Unfortunately performance is less than OpenSSL (vector ChaCha20 & vector Poly1305). Instruction latencies make Poly1305 slower than combined OpenSSL ChaCha20+Poly1305, thus it is not possible to reach same performance with stitching. Vector Poly1305 implementation is therefore needed.
Comment Actions
With little extra effort, stitched implementation turned out ok after all.
libgcrypt implementation, stitched ChaCha20-Poly1305:
CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte STREAM enc | 0.506 ns/B 1886 MiB/s 2.28 c/B STREAM dec | 0.506 ns/B 1884 MiB/s 2.28 c/B POLY1305 enc | 0.677 ns/B 1409 MiB/s 3.05 c/B POLY1305 dec | 0.655 ns/B 1456 MiB/s 2.95 c/B POLY1305 auth | 0.569 ns/B 1675 MiB/s 2.56 c/B
openssl 1.1.1f:
bench-slope-openssl: OpenSSL 1.1.1f 31 Mar 2020 Cipher: chacha20 | nanosecs/byte mebibytes/sec cycles/byte STREAM enc | 0.592 ns/B 1609.9 MiB/s 2.67 c/B STREAM dec | 0.593 ns/B 1607.1 MiB/s 2.67 c/B POLY1305 enc | 0.790 ns/B 1207.1 MiB/s 3.56 c/B POLY1305 dec | 0.809 ns/B 1178.9 MiB/s 3.64 c/B POLY1305 auth | 0.200 ns/B 4772.4 MiB/s 0.900 c/B