s390x/zSeries has vector register instruction set which can be used to implement faster Chacha20.
Description
Description
Revisions and Commits
Revisions and Commits
rC libgcrypt | |||
rC6a0bb9ab7f88 Add s390x/zSeries implementation of ChaCha20 |
Status | Assigned | Task | ||
---|---|---|---|---|
Open | jukivili | T4460 libgcrypt performance TODOs | ||
Resolved | jukivili | T5196 libgcrypt: s390x/zSeries performance improvements | ||
Resolved | jukivili | T5201 libgcrypt: s390x/zSeries 128-bit vector implementation of ChaCha20 |
Event Timeline
Comment Actions
Currently have 8 block parallel implementation done. Need to check if 6 block parallel approach is better (as used in OpenSSL - benefit being less register pressure and less moving of data between registers and stack).
Comment Actions
Reimplemented 8 block parallel in "vertical" orientation.
libgcrypt chacha20-s390x:
CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte STREAM enc | 0.506 ns/B 1886 MiB/s 2.28 c/B STREAM dec | 0.506 ns/B 1884 MiB/s 2.28 c/B
openssl 1.1.1f:
bench-slope-openssl: OpenSSL 1.1.1f 31 Mar 2020 Cipher: chacha20 | nanosecs/byte mebibytes/sec cycles/byte STREAM enc | 0.592 ns/B 1609.9 MiB/s 2.67 c/B STREAM dec | 0.593 ns/B 1607.1 MiB/s 2.67 c/B