Add PowerPC vpmsum implementation of CRC

Authored by jukivili on Sep 15 2019, 9:48 PM.

Description

Add PowerPC vpmsum implementation of CRC

* cipher/Makefile.am: Add 'crc-ppc.c'.
* cipher/crc-armv8-ce.c: Remove 'USE_INTEL_PCLMUL' comment.
* cipher/crc-ppc.c: New.
* cipher/crc.c (USE_PPC_VPMSUM): New.
(CRC_CONTEXT): Add 'use_vpmsum'.
(_gcry_crc32_ppc8_vpmsum, _gcry_crc24rfc2440_ppc8_vpmsum): New.
(crc32_init, crc24rfc2440_init): Add HWF check for 'use_vpmsum'.
(crc32_write, crc24rfc2440_write): Add 'use_vpmsum' code-path.
* configure.ac: Add 'vpmsumd' instruction to PowerPC VSX inline
assembly check; Add 'crc-ppc.lo'.

Benchmark on POWER8 (ppc64le, ~3.8Ghz):
Before:

|  nanosecs/byte   mebibytes/sec   cycles/byte

CRC32 | 0.978 ns/B 975.0 MiB/s 3.72 c/B
CRC24RFC2440 | 0.974 ns/B 978.8 MiB/s 3.70 c/B
After(~22x faster):

|  nanosecs/byte   mebibytes/sec   cycles/byte

CRC32 | 0.044 ns/B 21878 MiB/s 0.166 c/B
CRC24RFC2440 | 0.043 ns/B 22077 MiB/s 0.164 c/B

Benchmark on POWER9 (ppc64le, ~3.8Ghz):
Before:

|  nanosecs/byte   mebibytes/sec   cycles/byte

CRC32 | 1.01 ns/B 943.7 MiB/s 3.84 c/B
CRC24RFC2440 | 0.993 ns/B 960.6 MiB/s 3.77 c/B
After (~20x faster):

|  nanosecs/byte   mebibytes/sec   cycles/byte

CRC32 | 0.046 ns/B 20675 MiB/s 0.175 c/B
CRC24RFC2440 | 0.048 ns/B 19691 MiB/s 0.184 c/B

  • GnuPG-bug-id: T4460
  • Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>

Details

Committed
jukiviliSep 15 2019, 9:52 PM
Parents
rC557702f0d53a: Add PowerPC vector implementation of ChaCha20
Branches
Unknown
Tags
Unknown
Tasks
T4460: libgcrypt performance TODOs