Parent D490
This generates the S-Boxes on the fly, and thus
is more resistant to side-channel attacks.
I get an approximentally 2-3X speed-up with vcrypto support.
However, I saw no benifits from additionally using assembly for
the block mode code, so it is disabled for adding unnecessary
complexity and code size.
Before:
Cipher:
AES | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
ECB enc | 6.27 ns/B 152.2 MiB/s 7.92 c/B 1263
ECB dec | 7.10 ns/B 134.4 MiB/s 8.97 c/B 1264
CBC enc | 4.52 ns/B 211.2 MiB/s 5.71 c/B 1264
CBC dec | 4.59 ns/B 207.9 MiB/s 8.69 c/B 1895
CFB enc | 4.34 ns/B 219.6 MiB/s 8.23 c/B 1895
CFB dec | 4.14 ns/B 230.6 MiB/s 7.84 c/B 1895
OFB enc | 6.09 ns/B 156.7 MiB/s 11.54 c/B 1895
OFB dec | 6.03 ns/B 158.2 MiB/s 11.43 c/B 1895
CTR enc | 4.17 ns/B 228.9 MiB/s 7.90 c/B 1895
CTR dec | 4.17 ns/B 228.6 MiB/s 7.91 c/B 1895
XTS enc | 4.53 ns/B 210.4 MiB/s 8.59 c/B 1895
XTS dec | 5.00 ns/B 190.8 MiB/s 9.47 c/B 1895
CCM enc | 8.51 ns/B 112.0 MiB/s 16.13 c/B 1895
CCM dec | 8.51 ns/B 112.0 MiB/s 16.13 c/B 1895
CCM auth | 4.35 ns/B 219.1 MiB/s 8.25 c/B 1895
EAX enc | 8.51 ns/B 112.1 MiB/s 16.13 c/B 1895
EAX dec | 8.55 ns/B 111.5 MiB/s 16.21 c/B 1895
EAX auth | 4.34 ns/B 219.5 MiB/s 8.23 c/B 1895
GCM enc | 7.49 ns/B 127.3 MiB/s 14.20 c/B 1895
GCM dec | 7.49 ns/B 127.3 MiB/s 14.20 c/B 1895
GCM auth | 3.33 ns/B 286.2 MiB/s 6.31 c/B 1895
OCB enc | 4.33 ns/B 220.1 MiB/s 8.21 c/B 1895
OCB dec | 5.69 ns/B 167.5 MiB/s 9.32 c/B 1638
OCB auth | 5.05 ns/B 189.0 MiB/s 8.26 c/B 1638
After:
Cipher:
AES | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
ECB enc | 2.14 ns/B 445.7 MiB/s 4.06 c/B 1895
ECB dec | 2.41 ns/B 396.0 MiB/s 4.54 c/B 1887
CBC enc | 2.11 ns/B 451.9 MiB/s 4.00 c/B 1895
CBC dec | 2.06 ns/B 462.7 MiB/s 3.91 c/B 1895
CFB enc | 2.09 ns/B 455.9 MiB/s 3.96 c/B 1895
CFB dec | 2.09 ns/B 456.2 MiB/s 3.96 c/B 1895
OFB enc | 2.17 ns/B 439.9 MiB/s 4.11 c/B 1895
OFB dec | 2.12 ns/B 449.6 MiB/s 4.02 c/B 1895
CTR enc | 2.10 ns/B 454.6 MiB/s 3.98 c/B 1895
CTR dec | 2.09 ns/B 456.7 MiB/s 3.96 c/B 1895
XTS enc | 2.30 ns/B 415.3 MiB/s 4.35 c/B 1895
XTS dec | 2.29 ns/B 415.8 MiB/s 4.35 c/B 1895
CCM enc | 4.67 ns/B 204.2 MiB/s 7.65 c/B 1638
CCM dec | 4.83 ns/B 197.3 MiB/s 7.92 c/B 1638
CCM auth | 2.43 ns/B 391.9 MiB/s 3.99 c/B 1638
EAX enc | 4.84 ns/B 197.2 MiB/s 7.92 c/B 1638
EAX dec | 4.83 ns/B 197.3 MiB/s 7.92 c/B 1638
EAX auth | 2.42 ns/B 394.2 MiB/s 3.96 c/B 1638
GCM enc | 5.42 ns/B 176.0 MiB/s 10.27 c/B 1895
GCM dec | 5.42 ns/B 176.1 MiB/s 10.27 c/B 1895
GCM auth | 3.33 ns/B 286.2 MiB/s 6.32 c/B 1895
OCB enc | 2.10 ns/B 454.7 MiB/s 3.98 c/B 1895
OCB dec | 2.11 ns/B 452.8 MiB/s 3.99 c/B 1895
OCB auth | 2.10 ns/B 453.6 MiB/s 3.98 c/B 1895
Fixes T4529
There is a faster assembly version of XTS available, and I will try to get around to using it.
GCM is slow because of lack of CPU support.