Support for PowerPC's AES acceleration.
AbandonedPublic

Authored by slandden on May 24 2019, 6:03 AM.

Details

Summary

Parent D490

This generates the S-Boxes on the fly, and thus
is more resistant to side-channel attacks.

I get an approximentally 2-3X speed-up with vcrypto support.
However, I saw no benifits from additionally using assembly for
the block mode code, so it is disabled for adding unnecessary
complexity and code size.

Before:
Cipher:
AES | nanosecs/byte mebibytes/sec cycles/byte auto Mhz

 ECB enc |      6.27 ns/B     152.2 MiB/s      7.92 c/B      1263
 ECB dec |      7.10 ns/B     134.4 MiB/s      8.97 c/B      1264
 CBC enc |      4.52 ns/B     211.2 MiB/s      5.71 c/B      1264
 CBC dec |      4.59 ns/B     207.9 MiB/s      8.69 c/B      1895
 CFB enc |      4.34 ns/B     219.6 MiB/s      8.23 c/B      1895
 CFB dec |      4.14 ns/B     230.6 MiB/s      7.84 c/B      1895
 OFB enc |      6.09 ns/B     156.7 MiB/s     11.54 c/B      1895
 OFB dec |      6.03 ns/B     158.2 MiB/s     11.43 c/B      1895
 CTR enc |      4.17 ns/B     228.9 MiB/s      7.90 c/B      1895
 CTR dec |      4.17 ns/B     228.6 MiB/s      7.91 c/B      1895
 XTS enc |      4.53 ns/B     210.4 MiB/s      8.59 c/B      1895
 XTS dec |      5.00 ns/B     190.8 MiB/s      9.47 c/B      1895
 CCM enc |      8.51 ns/B     112.0 MiB/s     16.13 c/B      1895
 CCM dec |      8.51 ns/B     112.0 MiB/s     16.13 c/B      1895
CCM auth |      4.35 ns/B     219.1 MiB/s      8.25 c/B      1895
 EAX enc |      8.51 ns/B     112.1 MiB/s     16.13 c/B      1895
 EAX dec |      8.55 ns/B     111.5 MiB/s     16.21 c/B      1895
EAX auth |      4.34 ns/B     219.5 MiB/s      8.23 c/B      1895
 GCM enc |      7.49 ns/B     127.3 MiB/s     14.20 c/B      1895
 GCM dec |      7.49 ns/B     127.3 MiB/s     14.20 c/B      1895
GCM auth |      3.33 ns/B     286.2 MiB/s      6.31 c/B      1895
 OCB enc |      4.33 ns/B     220.1 MiB/s      8.21 c/B      1895
 OCB dec |      5.69 ns/B     167.5 MiB/s      9.32 c/B      1638
OCB auth |      5.05 ns/B     189.0 MiB/s      8.26 c/B      1638

After:
Cipher:
AES | nanosecs/byte mebibytes/sec cycles/byte auto Mhz

 ECB enc |      2.14 ns/B     445.7 MiB/s      4.06 c/B      1895
 ECB dec |      2.41 ns/B     396.0 MiB/s      4.54 c/B      1887
 CBC enc |      2.11 ns/B     451.9 MiB/s      4.00 c/B      1895
 CBC dec |      2.06 ns/B     462.7 MiB/s      3.91 c/B      1895
 CFB enc |      2.09 ns/B     455.9 MiB/s      3.96 c/B      1895
 CFB dec |      2.09 ns/B     456.2 MiB/s      3.96 c/B      1895
 OFB enc |      2.17 ns/B     439.9 MiB/s      4.11 c/B      1895
 OFB dec |      2.12 ns/B     449.6 MiB/s      4.02 c/B      1895
 CTR enc |      2.10 ns/B     454.6 MiB/s      3.98 c/B      1895
 CTR dec |      2.09 ns/B     456.7 MiB/s      3.96 c/B      1895
 XTS enc |      2.30 ns/B     415.3 MiB/s      4.35 c/B      1895
 XTS dec |      2.29 ns/B     415.8 MiB/s      4.35 c/B      1895
 CCM enc |      4.67 ns/B     204.2 MiB/s      7.65 c/B      1638
 CCM dec |      4.83 ns/B     197.3 MiB/s      7.92 c/B      1638
CCM auth |      2.43 ns/B     391.9 MiB/s      3.99 c/B      1638
 EAX enc |      4.84 ns/B     197.2 MiB/s      7.92 c/B      1638
 EAX dec |      4.83 ns/B     197.3 MiB/s      7.92 c/B      1638
EAX auth |      2.42 ns/B     394.2 MiB/s      3.96 c/B      1638
 GCM enc |      5.42 ns/B     176.0 MiB/s     10.27 c/B      1895
 GCM dec |      5.42 ns/B     176.1 MiB/s     10.27 c/B      1895
GCM auth |      3.33 ns/B     286.2 MiB/s      6.32 c/B      1895
 OCB enc |      2.10 ns/B     454.7 MiB/s      3.98 c/B      1895
 OCB dec |      2.11 ns/B     452.8 MiB/s      3.99 c/B      1895
OCB auth |      2.10 ns/B     453.6 MiB/s      3.98 c/B      1895

Fixes T4529

GCM is slow because of lack of CPU support.

Test Plan

This is integrated into the existing tests, and they all pass.

Diff Detail

Lint
Lint Skipped
Unit
Unit Tests Skipped
slandden created this revision.May 24 2019, 6:03 AM
slandden edited the summary of this revision. (Show Details)
slandden edited the summary of this revision. (Show Details)May 24 2019, 6:06 AM
slandden edited the summary of this revision. (Show Details)
slandden edited the summary of this revision. (Show Details)
slandden updated this revision to Diff 1388.May 24 2019, 6:15 AM
slandden retitled this revision from Add PowerPC crypto acceleration support for SHA2. to Support for PowerPC's AES acceleration..
slandden edited the summary of this revision. (Show Details)

Actually include modified perlasm file.

slandden planned changes to this revision.May 24 2019, 6:39 AM

Consider using tests/bench-slope to get cycles/byte results so they can be compared with https://github.com/dot-asm/cryptogams/blob/master/ppc/aesp8-ppc.pl#L34

Command "tests/bench-slope --cpu-mhz auto cipher aes" should do the trick, assuming auto-detection of cpu-mhz work on powerpc.

slandden updated this revision to Diff 1390.May 24 2019, 9:04 PM

proper benchmarks

slandden edited the summary of this revision. (Show Details)May 27 2019, 9:15 PM

@jukivili

Benchmarks with the block ciphers is here https://dev.gnupg.org/D493

slandden updated this revision to Diff 1406.Jun 6 2019, 9:07 PM

resolve merge conflicts

We are trying to apply patches in order to conduct internal testing. They did apply successfully. However, we can't get the result to link because _gcry_hwf_detect_ppc is undefined. Is there a hwf-ppc.c somewhere?

slandden updated this revision to Diff 1408.Jun 7 2019, 10:39 PM

include hwf-ppc.c

slandden planned changes to this revision.Jun 8 2019, 1:17 AM

It turns out that the upstream cryptogams is broken on ppc64 big-endian elfv1. I reported this upstream https://github.com/dot-asm/cryptogams/issues/5 (openssl version works fine)

slandden updated this revision to Diff 1410.Jun 8 2019, 2:09 AM
slandden edited the summary of this revision. (Show Details)

rebase

Thanks for the hwf-ppc.c. I've pulled the latest from upstream, applied the patches, and gotten the updated library built. Will let you know of any feedback from our performance team.

slandden updated this revision to Diff 1411.Jun 19 2019, 5:32 PM

fix running with hardware acceleration off.

slandden abandoned this revision.Aug 30 2019, 6:53 PM

this has been commited