ECB mode might be worth doing as well (not part of cryptogams)
libgcrypt's decision to use a 128-bit counter for CTR mode is highly unusual and I can't find that in any RFCs (they either use 64-bit or 32-bit counters), but I implemented it to pass the tests.
I ran all the tests on ppc64el and amd64
see T4529
I need to disclaim that these benchmarks are all wrong because these chips have a "turbo" mode, so the CPUs run at a faster clock speed for short bursts of time.
AES | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
ECB enc | 2.84 ns/B 336.1 MiB/s 5.38 c/B 1895
ECB dec | 2.89 ns/B 330.6 MiB/s 5.47 c/B 1895
CBC enc | 1.05 ns/B 908.3 MiB/s 1.99 c/B 1895
CBC dec | 0.221 ns/B 4315 MiB/s 0.419 c/B 1895
CFB enc | 4.41 ns/B 216.4 MiB/s 8.35 c/B 1895
CFB dec | 4.88 ns/B 195.3 MiB/s 9.26 c/B 1895
OFB enc | 5.06 ns/B 188.4 MiB/s 9.59 c/B 1895
OFB dec | 5.07 ns/B 188.2 MiB/s 9.60 c/B 1895
CTR enc | 0.218 ns/B 4374 MiB/s 0.413 c/B 1895
CTR dec | 0.219 ns/B 4349 MiB/s 0.416 c/B 1895
XTS enc | 0.681 ns/B 1400 MiB/s 1.29 c/B 1895
XTS dec | 0.687 ns/B 1387 MiB/s 1.30 c/B 1895
CCM enc | 4.21 ns/B 226.4 MiB/s 5.32 c/B 1264
CCM dec | 4.21 ns/B 226.7 MiB/s 5.32 c/B 1264
CCM auth | 3.99 ns/B 239.2 MiB/s 5.04 c/B 1264
EAX enc | 4.20 ns/B 227.2 MiB/s 5.30 c/B 1264
EAX dec | 4.21 ns/B 226.5 MiB/s 5.32 c/B 1264
EAX auth | 3.97 ns/B 239.9 MiB/s 5.02 c/B 1264
GCM enc | 19.81 ns/B 48.14 MiB/s 25.03 c/B 1264
GCM dec | 19.79 ns/B 48.18 MiB/s 25.01 c/B 1264
GCM auth | 19.55 ns/B 48.78 MiB/s 24.71 c/B 1264
OCB enc | 17.53 ns/B 54.41 MiB/s 14.77 c/B 842.4
OCB dec | 13.89 ns/B 68.67 MiB/s 17.55 c/B 1263
OCB auth | 9.14 ns/B 104.4 MiB/s 11.54 c/B 1264
big-endian:
AES | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
ECB enc | 2.17 ns/B 440.1 MiB/s 4.11 c/B 1895
ECB dec | 2.49 ns/B 382.3 MiB/s 4.73 c/B 1895
CBC enc | 1.05 ns/B 908.8 MiB/s 1.99 c/B 1895
CBC dec | 0.201 ns/B 4748 MiB/s 0.381 c/B 1895
CFB enc | 2.07 ns/B 460.2 MiB/s 3.93 c/B 1895
CFB dec | 2.07 ns/B 460.0 MiB/s 3.93 c/B 1895
OFB enc | 2.54 ns/B 375.3 MiB/s 4.82 c/B 1895
OFB dec | 2.11 ns/B 451.9 MiB/s 4.00 c/B 1895
CTR enc | 0.207 ns/B 4609 MiB/s 0.261 c/B 1264
CTR dec | 0.207 ns/B 4611 MiB/s 0.261 c/B 1264
XTS enc | 0.564 ns/B 1689 MiB/s 1.07 c/B 1895
XTS dec | 0.562 ns/B 1697 MiB/s 1.07 c/B 1895
CCM enc | 2.28 ns/B 419.1 MiB/s 4.31 c/B 1895
CCM dec | 2.28 ns/B 417.9 MiB/s 4.33 c/B 1895
CCM auth | 2.07 ns/B 459.9 MiB/s 3.93 c/B 1895
EAX enc | 2.28 ns/B 418.8 MiB/s 4.32 c/B 1895
EAX dec | 2.28 ns/B 418.4 MiB/s 4.32 c/B 1895
EAX auth | 2.07 ns/B 460.2 MiB/s 3.93 c/B 1895
GCM enc | 3.17 ns/B 301.1 MiB/s 6.00 c/B 1895
GCM dec | 3.17 ns/B 300.9 MiB/s 4.01 c/B 1264
GCM auth | 2.96 ns/B 322.0 MiB/s 5.61 c/B 1895
OCB enc | 2.08 ns/B 458.1 MiB/s 3.95 c/B 1895
OCB dec | 2.07 ns/B 461.8 MiB/s 3.91 c/B 1895
OCB auth | 2.08 ns/B 458.4 MiB/s 3.94 c/B 1895