Page MenuHome GnuPG

Reducing memory copy overhead in iobuf and estream to increase OCB speed
Closed, ResolvedPublic

Description

Profiling AES256-OCB encryption and decryption shows that large portion of gnupg program time is spent in memcpy functions. Most of these memory copies are done from iobuf read/write functions and estream es_write_fbf function as can be seen in these "flamegraphs":


Graph for OCB encryption also shows that memory copy from g10/cipher-aead.c:do_hash() takes significant portion of program time.

For CFB/MDC, memory copies have lesser effect as encryption and decryption are slower because of additional SHA1 processing:

Following patches try to minimize these memory copies in iobuf, estream and cipher-aead.c. For estreams, buffering is disabled as input is already provided as large buffers. cipher-aead.c:do_hash is changed to avoid additional memory copy when possible. For iobuf_read and iobuf_write new zerocopy mode of operation is added, that is avoiding coping to internal drain buffer when output/input buffer is large.

Which these patches, memory copy overhead is mostly eliminated as can be seen from these flamegraphs:


Benchmark results on AMD Ryzen 5800X:

                         Before           After
AES256-OCB encryption     1960 MiB/s       5300 MiB/s
AES256-OCB decryption     4040 MiB/s       5350 MiB/s
AES256-CFB encryption      750 MiB/s        760 MiB/s
AES256-CFB decryption     1590 MiB/s       1610 MiB/s

Event Timeline

jukivili created this task.
jukivili created this object in space S1 Public.

New versions of patches 0005 and 0006 - fixes EOF handling issues noticed with compression/decompression:

Does this look ok to push to master? @werner @gniibe

Ack from me for new 0005 and 0006.

Side notes:

  • Please remember that v5 AEAD thing in OpenPGP specification is still in flux.
  • Also, I am considering a set of API for libgcrypt supporting AEAD (so that libgcrypt can ensure correct use of key and iv).

I went through my test files and found that --enarmor on zero length input file did no longer work. I made separate patch to fix that issue, which then also needs another approach for handling compress issue noticed earlier: