VERY impressive speed wins over the cryptogams version:
Also, easier to maintain than an assembly version.
8x was only marginally faster than 6x. Probably could be sped up
with a vectorgather instruction.
Before:
ECB enc | 2.84 ns/B 336.1 MiB/s 5.38 c/B 1895
ECB dec | 2.89 ns/B 330.6 MiB/s 5.47 c/B 1895
CBC enc | 1.05 ns/B 908.3 MiB/s 1.99 c/B 1895
CBC dec | 0.221 ns/B 4315 MiB/s 0.419 c/B 1895
CFB enc | 4.41 ns/B 216.4 MiB/s 8.35 c/B 1895
CFB dec | 4.88 ns/B 195.3 MiB/s 9.26 c/B 1895
OFB enc | 5.06 ns/B 188.4 MiB/s 9.59 c/B 1895
OFB dec | 5.07 ns/B 188.2 MiB/s 9.60 c/B 1895
CTR enc | 0.218 ns/B 4374 MiB/s 0.413 c/B 1895
CTR dec | 0.219 ns/B 4349 MiB/s 0.416 c/B 1895
XTS enc | 0.681 ns/B 1400 MiB/s 1.29 c/B 1895
XTS dec | 0.687 ns/B 1387 MiB/s 1.30 c/B 1895
CCM enc | 4.21 ns/B 226.4 MiB/s 5.32 c/B 1264
CCM dec | 4.21 ns/B 226.7 MiB/s 5.32 c/B 1264
CCM auth | 3.99 ns/B 239.2 MiB/s 5.04 c/B 1264
EAX enc | 4.20 ns/B 227.2 MiB/s 5.30 c/B 1264
EAX dec | 4.21 ns/B 226.5 MiB/s 5.32 c/B 1264
EAX auth | 3.97 ns/B 239.9 MiB/s 5.02 c/B 1264
GCM enc | 19.81 ns/B 48.14 MiB/s 25.03 c/B 1264
GCM dec | 19.79 ns/B 48.18 MiB/s 25.01 c/B 1264
GCM auth | 19.55 ns/B 48.78 MiB/s 24.71 c/B 1264
OCB enc | 17.53 ns/B 54.41 MiB/s 14.77 c/B 842.4
OCB dec | 13.89 ns/B 68.67 MiB/s 17.55 c/B 1263
OCB auth | 9.14 ns/B 104.4 MiB/s 11.54 c/B 1264
After:
AES | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
ECB enc | 1.98 ns/B 482.6 MiB/s 3.75 c/B 1895 <===== ECB dec | 1.80 ns/B 529.3 MiB/s 3.42 c/B 1895 <===== CBC enc | 1.05 ns/B 907.7 MiB/s 1.99 c/B 1895 CBC dec | 0.221 ns/B 4317 MiB/s 0.419 c/B 1895 CFB enc | 1.65 ns/B 578.5 MiB/s 3.12 c/B 1895 CFB dec | 1.03 ns/B 925.9 MiB/s 1.95 c/B 1895 OFB enc | 2.34 ns/B 408.2 MiB/s 3.83 c/B 1638 OFB dec | 2.33 ns/B 410.1 MiB/s 3.81 c/B 1638 CTR enc | 0.216 ns/B 4416 MiB/s 0.409 c/B 1895 CTR dec | 0.216 ns/B 4422 MiB/s 0.409 c/B 1895 XTS enc | 0.557 ns/B 1712 MiB/s 1.06 c/B 1895 XTS dec | 0.561 ns/B 1701 MiB/s 1.06 c/B 1895 CCM enc | 1.87 ns/B 509.9 MiB/s 3.54 c/B 1895 CCM dec | 1.87 ns/B 509.8 MiB/s 3.55 c/B 1895 CCM auth | 1.65 ns/B 576.4 MiB/s 3.14 c/B 1895 EAX enc | 1.87 ns/B 510.3 MiB/s 3.54 c/B 1895 EAX dec | 1.87 ns/B 510.0 MiB/s 3.54 c/B 1895 EAX auth | 1.65 ns/B 576.9 MiB/s 3.13 c/B 1895 GCM enc | 3.55 ns/B 268.7 MiB/s 6.73 c/B 1895 GCM dec | 3.55 ns/B 268.7 MiB/s 6.73 c/B 1895 GCM auth | 3.33 ns/B 286.2 MiB/s 6.32 c/B 1895 OCB enc | 0.426 ns/B 2241 MiB/s 0.807 c/B 1895 <==== OCB dec | 0.409 ns/B 2333 MiB/s 0.775 c/B 1895 <==== OCB auth | 1.23 ns/B 772.7 MiB/s 2.34 c/B 1895