Profiling AES256-OCB encryption and decryption shows that large portion of gnupg program time is spent in memcpy functions. Most of these memory copies are done from iobuf read/write functions and estream es_write_fbf function as can be seen in these "flamegraphs":
Graph for OCB encryption also shows that memory copy from g10/cipher-aead.c:do_hash() takes significant portion of program time.
For CFB/MDC, memory copies have lesser effect as encryption and decryption are slower because of additional SHA1 processing:
Following patches try to minimize these memory copies in iobuf, estream and cipher-aead.c. For estreams, buffering is disabled as input is already provided as large buffers. cipher-aead.c:do_hash is changed to avoid additional memory copy when possible. For iobuf_read and iobuf_write new zerocopy mode of operation is added, that is avoiding coping to internal drain buffer when output/input buffer is large.
Which these patches, memory copy overhead is mostly eliminated as can be seen from these flamegraphs:
Benchmark results on AMD Ryzen 5800X:
Before After AES256-OCB encryption 1960 MiB/s 5300 MiB/s AES256-OCB decryption 4040 MiB/s 5350 MiB/s AES256-CFB encryption 750 MiB/s 760 MiB/s AES256-CFB decryption 1590 MiB/s 1610 MiB/s