Home GnuPG

AES-NI/OCB: Perform checksumming inline with encryption

Description

AES-NI/OCB: Perform checksumming inline with encryption

* cipher/rijndael-aesni.c (aesni_ocb_enc): Remove call to
'aesni_ocb_checksum', instead perform checksumming inline with offset
calculations.

This patch reverts the OCB checksumming split for encryption to avoid
performance issue seen on Intel CPUs.

Commit b42de67f34 "Optimizations for AES-NI OCB" changed AES-NI/OCB
implementation perform checksumming as separate pass from encryption
and decryption. While this change improved performance for buffer
sizes 16 to 4096 bytes (buffer sizes used by bench-slope), it
introduced performance anomalia with OCB encryption on Intel
processors. Below is large buffer OCB encryption results on Intel
Haswell. There we can see that with buffer sizes larger than 32 KiB
performance starts dropping. Decryption does not suffer from the same
issue.

MiB/s Speed by Data Length (at 2 Ghz)
2800 +-------------------------------------------------------------+
2600 |-+ + + ..**+ + + +-|

|                  **.**           *.****.****.****           |

2400 |-+ *. *.*.|
2200 |-+
* +-|
2000 |-+ *.* +-|

|       **                                                    |

1800 |-+ +-|
1600 |-+ *.* +-|
1400 |-+
+-|

|**                                                           |

1200 |*+ + + + + + + +-|
1000 +-------------------------------------------------------------+

1024       4096      16384     65536    262144     1048576
                  Data Length in Bytes

I've tested and reproduced this issue on Intel Ivy-Bridge, Haswell
and Skylake processors. Same performance drop on large buffers is not
seen on AMD Ryzen. Below is OCB decryption speed plot from Haswell for
reference, showing expected performance curve over increasing buffer
sizes.

MiB/s Speed by Data Length (at 2 Ghz)
2800 +-------------------------------------------------------------+
2600 |-+ + + ......*.**|

|                  **.**                                      |

2400 |-+ *. +-|
2200 |-+
* +-|
2000 |-+ *.* +-|

|       **                                                    |

1800 |-+ +-|
1600 |-+ *.* +-|
1400 |-+
+-|

|**                                                           |

1200 |*+ + + + + + + +-|
1000 +-------------------------------------------------------------+

1024       4096      16384     65536    262144     1048576
                  Data Length in Bytes

After this patch, bench-slope shows ~2% reduction on performance on
Intel Haswell:

Before:
AES | nanosecs/byte mebibytes/sec cycles/byte auto Mhz

OCB enc |     0.171 ns/B      5581 MiB/s     0.683 c/B      3998

After:
AES | nanosecs/byte mebibytes/sec cycles/byte auto Mhz

OCB enc |     0.174 ns/B      5468 MiB/s     0.697 c/B      3998
  • Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>

Details

Provenance
jukiviliAuthored on Mar 27 2019, 10:10 PM
Parents
rCb82dbbedf027: AES-NI/OCB: Use stack for temporary storage
Branches
Unknown
Tags
Unknown