libgcrypt: POWER SHA-2 Vector Acceleration
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	gcwilson
	May 20 2019, 7:04 PM

Description

Use POWER8 and POWER9 ISA enhancements to improve the performance of SHA-2. Demonstrate why achieved performance is close to optimal for the platform. Optimized implementations in the Cryptogams repository[1] may serve as useful references. Financial bounty upon completion and community acceptance of patches.

https://github.com/dot-asm/cryptogams/

Revisions and Commits

	Abandoned	D490 PowerPC optimized routines for AES and SHA2 using PowerISA 2.07 instructions.
	Needs Review	D492 Add PowerPC crypto acceleration support for SHA2.
rC libgcrypt
		rC93632f1adf57 Add SHA-512 implementations for POWER8 and POWER9
		rCe19dc973bc8e Add SHA-256 implementations for POWER8 and POWER9

Related Objects
Search...

Status	Assigned	Task
Open	jukivili	T4460 libgcrypt performance TODOs
Resolved	jukivili	T4531 PowerPC performance improvements
Resolved	jukivili	T4530 libgcrypt: POWER SHA-2 Vector Acceleration

Event Timeline

gcwilson created this task.May 20 2019, 7:04 PM

• werner renamed this task from [$] libgcrypt: POWER SHA-2 Vector Acceleration to libgcrypt: POWER SHA-2 Vector Acceleration.May 21 2019, 7:52 AM

• werner triaged this task as Normal priority.

• werner added a parent task: T4531: PowerPC performance improvements.

slandden added a revision: D492: Add PowerPC crypto acceleration support for SHA2..May 24 2019, 6:06 AM

raksprak added a subscriber: raksprak.Jun 18 2019, 12:20 PM

jukivili added a revision: D490: PowerPC optimized routines for AES and SHA2 using PowerISA 2.07 instructions..Jul 8 2019, 2:57 PM

johnmar added a subscriber: johnmar.Jul 9 2019, 10:12 PM

johnmar raised the priority of this task from Normal to Needs Triage.Jul 15 2019, 9:09 PM

Please do not change the priority back without discussing this with the maintainer first. Thanks.

I'll start working on new PowerPC SHA2 implementations for libgcrypt in coming weeks.

Patches send to mailing list:
https://lists.gnupg.org/pipermail/gcrypt-devel/2019-August/004800.html
https://lists.gnupg.org/pipermail/gcrypt-devel/2019-August/004799.html

SHA256 results:

Benchmark on POWER8 ~3.8Ghz:
 Before:
                |  nanosecs/byte   mebibytes/sec   cycles/byte
 SHA256         |      4.17 ns/B     228.6 MiB/s     15.85 c/B

 After (~1.63x faster):
                |  nanosecs/byte   mebibytes/sec   cycles/byte
 SHA256         |      2.55 ns/B     373.9 MiB/s      9.69 c/B

 For comparison, OpenSSL 1.1.1b (~2.4% slower):
                |  nanosecs/byte   mebibytes/sec   cycles/byte
 SHA256         |      2.61 ns/B     364.8 MiB/s      9.93 c/B


Benchmark on POWER9 ~3.8Ghz:
 Before:
                |  nanosecs/byte   mebibytes/sec   cycles/byte
 SHA256         |      3.23 ns/B     295.6 MiB/s     12.26 c/B

 After (~1.03x faster):
                |  nanosecs/byte   mebibytes/sec   cycles/byte
 SHA256         |      3.11 ns/B     306.8 MiB/s     11.81 c/B

 For comparison, OpenSSL 1.1.1b (~6.6% faster):
                |  nanosecs/byte   mebibytes/sec   cycles/byte
 SHA256         |      2.91 ns/B     327.5 MiB/s     11.07 c/B

SHA512 results:

Benchmark on POWER8 ~3.8Ghz:
 Before:
                |  nanosecs/byte   mebibytes/sec   cycles/byte
 SHA512         |      3.47 ns/B     274.6 MiB/s     13.20 c/B

 After (~2.08x faster):
                |  nanosecs/byte   mebibytes/sec   cycles/byte
 SHA512         |      1.66 ns/B     573.1 MiB/s      6.32 c/B

 For comparison, OpenSSL 1.1.1b (~1.6% faster):
                |  nanosecs/byte   mebibytes/sec   cycles/byte
 SHA512         |      1.64 ns/B     582.2 MiB/s      6.22 c/B


Benchmark on POWER9 ~3.8Ghz:
 Before:
                |  nanosecs/byte   mebibytes/sec   cycles/byte
 SHA512         |      2.65 ns/B     359.6 MiB/s     10.08 c/B

 After (~1.33x faster):
                |  nanosecs/byte   mebibytes/sec   cycles/byte
 SHA512         |      1.99 ns/B     479.2 MiB/s      7.56 c/B

 For comparison, OpenSSL 1.1.1b (~9.4% faster):
                |  nanosecs/byte   mebibytes/sec   cycles/byte
 SHA512         |      1.82 ns/B     524.4 MiB/s      6.91 c/B

I have not been able to get Altivec/VSX intrinsic implementation to work fast on POWER9. Appears that SHA2 vector acceleration gives diminishing returns on POWER9. For example, OpenSSL assembly vshasigma(w|d) implementations are only 6 to 10% faster than optimized non-vector C implementation provided here, which is within what is expected speed-up if these C implementations would be turned into assembly implementations.

jukivili added a commit: rCe19dc973bc8e: Add SHA-256 implementations for POWER8 and POWER9.Sep 3 2019, 9:34 PM

jukivili added a commit: rC93632f1adf57: Add SHA-512 implementations for POWER8 and POWER9.

PowerPC SHA-256 and SHA-512 implementations with little bit more tuning committed. Most notably, SHA-512 on POWER8 now gives similar performance to OpenSSL:

Benchmark on POWER8 ~3.8Ghz:
 Before:
                |  nanosecs/byte   mebibytes/sec   cycles/byte
 SHA512         |      3.47 ns/B     274.6 MiB/s     13.20 c/B

 After (~2.1x faster):
                |  nanosecs/byte   mebibytes/sec   cycles/byte
 SHA512         |      1.64 ns/B     581.8 MiB/s      6.23 c/B

 For comparison, OpenSSL 1.1.1b (~same):
                |  nanosecs/byte   mebibytes/sec   cycles/byte
 SHA512         |      1.64 ns/B     582.2 MiB/s      6.22 c/B

• werner mentioned this in T4294: Release Libgcrypt 1.9.0.Jan 19 2021, 10:17 AM

libgcrypt: POWER SHA-2 Vector AccelerationClosed, ResolvedPublicActions

Description

Revisions and Commits

Related ObjectsSearch...

Event Timeline

libgcrypt: POWER SHA-2 Vector Acceleration
Closed, ResolvedPublic
Actions

Related Objects
Search...