AES-NI/OCB: Perform checksumming inline with encryption
* cipher/rijndael-aesni.c (aesni_ocb_enc): Remove call to 'aesni_ocb_checksum', instead perform checksumming inline with offset calculations.
This patch reverts the OCB checksumming split for encryption to avoid
performance issue seen on Intel CPUs.
Commit b42de67f34 "Optimizations for AES-NI OCB" changed AES-NI/OCB
implementation perform checksumming as separate pass from encryption
and decryption. While this change improved performance for buffer
sizes 16 to 4096 bytes (buffer sizes used by bench-slope), it
introduced performance anomalia with OCB encryption on Intel
processors. Below is large buffer OCB encryption results on Intel
Haswell. There we can see that with buffer sizes larger than 32 KiB
performance starts dropping. Decryption does not suffer from the same
issue.
MiB/s Speed by Data Length (at 2 Ghz)
2800 +-------------------------------------------------------------+
2600 |-+ + + ..**+ + + +-|
| **.** *.****.****.**** |
2400 |-+ *. *.*.|
2200 |-+ * +-|
2000 |-+ *.* +-|
| ** |
1800 |-+ +-|
1600 |-+ *.* +-|
1400 |-+ +-|
|** |
1200 |*+ + + + + + + +-|
1000 +-------------------------------------------------------------+
1024 4096 16384 65536 262144 1048576 Data Length in Bytes
I've tested and reproduced this issue on Intel Ivy-Bridge, Haswell
and Skylake processors. Same performance drop on large buffers is not
seen on AMD Ryzen. Below is OCB decryption speed plot from Haswell for
reference, showing expected performance curve over increasing buffer
sizes.
MiB/s Speed by Data Length (at 2 Ghz)
2800 +-------------------------------------------------------------+
2600 |-+ + + ......*.**|
| **.** |
2400 |-+ *. +-|
2200 |-+ * +-|
2000 |-+ *.* +-|
| ** |
1800 |-+ +-|
1600 |-+ *.* +-|
1400 |-+ +-|
|** |
1200 |*+ + + + + + + +-|
1000 +-------------------------------------------------------------+
1024 4096 16384 65536 262144 1048576 Data Length in Bytes
After this patch, bench-slope shows ~2% reduction on performance on
Intel Haswell:
Before:
AES | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
OCB enc | 0.171 ns/B 5581 MiB/s 0.683 c/B 3998
After:
AES | nanosecs/byte mebibytes/sec cycles/byte auto Mhz
OCB enc | 0.174 ns/B 5468 MiB/s 0.697 c/B 3998
- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>