Page MenuHome GnuPG

OpenPGP benchmarks on Windows OCB vs. CFB + MDC vs. Unsigned vs. Signed on real data.
Open, WishlistPublic

Description

tl;dr; As this is about performance I am only wishlisting this ticket. I can totally live with the results here. OCB is _really_ fast when encrypting. Which raised my expectations, but the signing time threw me off a bit as the disk IO went down by a lot also for other operations. So I wanted to document my tests at least.

The notion to test this came from us bypassing gpgme callback based IO with tarballs, afterwards I had expected to see the operations to be mostly IO Bound but while testing GnuPG 2.4-master with a fast virtual disk I noticed that it was less IO Bound then I expected as contrary to previous tests on windows which I mostly did on slow disks I tried a fixed width VMDK with an NVMe controller and using Host IO Cache. This gives a throughput of about 1GB/s. I could test natively with ~5GB/s on Windows but I doubt it would change the general conclusions much.

Windows Defender is turned off. 6 cores with full execution cap on a Ryzen 9 6900HX. I tried using fewer cores but that made the test results too variable. Remember this is 32 Bit windows as used in GnuPG Windows installer.

Using gnupg-2.4-beta25 (10c937ee68cbf784942630115449f32cd82089fe) built with release settings for 32 bit
Test data is: 4,97 GB (5.343.642.075 Bytes) split over 5 files.

Timings are roughly the average of three runs.

Io test / read write on the same disk

Measure-Command {mkdir 5gb2; xcopy /S .\5gb\ .\5gb2}
~Seconds : 5.1

Measure-Command {gpgtar --yes --skip-crypto --create -o 5gb.tar .\5gb\}
~Seconds : 9

^ nice ­čśŹ I used the native Windows tar for comparison and it ended up in the same range, with some strange exception that overwriting a file took 25 seconds.

Measure-Command { gpg --yes -er ldata-test -o .\5g.tar.gpg .\5gb.tar }
(:aead encrypted packet: cipher=9 aead=2 cb=16)
~Seconds : 9

^ Awesome! ­čśŹ

But the default in Kleopatra is to sign and encrypt. And that ends up at:

Measure-Command { gpg --yes -su ldata-test -er ldata-test -o .\5g.tar.gpg .\5gb.tar }
:aead encrypted packet: cipher=9 aead=2 cb=16
:onepass_sig packet: keyid 6FAF8982C209FFA8
version 3, sigclass 0x00, digest 10, pubkey 22, last=1
~Seconds : 42

With the prefs from a 2.2 key, key is called cbc-test for further tests:
(later edit: This resulted in CFB + MDC)

setpref S9 S8 S7 S2 H10 H9 H8 H11 H2 Z2 Z3 Z1

Measure-Command { gpg --yes -er cbc-test -o .\5g.tar.gpg .\5gb.tar }
~Seconds : 85

With list packets this then only shows: (So is this really CBC?)

off=96 ctb=d2 tag=18 hlen=2 plen=0 partial new-ctb
:encrypted data packet:
length: unknown
mdc_method: 2

But adding signing here makes less difference:

Measure-Command { gpg --yes -su cbc-test -er cbc-test -o .\5g.tar.gpg .\5gb.tar }
~Seconds : 105
:encrypted data packet:
length: unknown
mdc_method: 2
off=117 ctb=90 tag=4 hlen=2 plen=13
:onepass_sig packet: keyid D6086C1E3CABA7FC
version 3, sigclass 0x00, digest 10, pubkey 22, last=1

Now for completeness decrypt:

CFB+MDC Signed & Encrypted: 90 Seconds
CFB+MDC Encrypted: 80 Seconds
OCB Encrypted: 13 Seconds ­čą░
OCB Signed & Encrypted: 40 Seconds

Now if we could get everything down to the level of OCB Encrypted. :)

Event Timeline

aheinecke triaged this task as Wishlist priority.Jun 26 2023, 1:38 PM
aheinecke created this task.
aheinecke renamed this task from OpenPGP benchmarks on Windows OCB vs. CBC vs. Unsigned vs. Signed on real data. to OpenPGP benchmarks on Windows OCB vs. CFB + MDC vs. Unsigned vs. Signed on real data..Jun 26 2023, 1:50 PM
aheinecke updated the task description. (Show Details)

s/CBC/CFB+MDC/

Edited the task to reflect that.

FWIW, gpg shows the actual cipher and encryption mode with -v. For example

gpg: ECDH/AES256.CFB encrypted for: "2B999FA9CE046B1B [...]

gpg: ECDH/AES256.OCB encrypted for: "FFB6A9DC972C60D4 [...]"

Make sure to use --no-encrypt-to in case your default key has the OCB feature not set. CFB always means CFB+MDC because the legacy CFB-only mode is deprecated. To show the OCB feature use:

$ gpg --list-options show-pref -k  FFB6A9DC972C60D4
[...]
            S9 S8 S7 S2 A2 H10 H9 H8 H11 H2 Z2 Z3 Z1 [mdc] [aead] [no-ks-modify]

The [aead] indicates that OCB mode can be used. There is also a variant of the option which spells the flags out:

$ gpg --list-options show-pref-verbose -k  FFB6A9DC972C60D4
[...]
           Cipher: AES256, AES192, AES, 3DES
           AEAD: OCB
           Digest: SHA512, SHA384, SHA256, SHA224, SHA1
           Compression: ZLIB, BZIP2, ZIP, Uncompressed
           Features: MDC, AEAD, Keyserver no-modify

! In T6561#172087, @werner wrote:
FWIW, gpg shows the actual cipher and encryption mode with -v. For example

Ah I forgot that. But a quick check showed that my tests were encrypted as expected I just confused the non OCB mode.

Make sure to use --no-encrypt-to in case your default key has the OCB feature not set. CFB always means CFB+MDC because the legacy CFB-only mode is deprecated. To show the OCB feature use:

There is no "encrypt-to" set. The ldata-test key has AEAD set. For the CFB+MDC Key which I named wrong I removed the A2 pref.

My test data with -v reads:

gpg: ├ľffentlicher Schl├╝ssel ist F0E9CFDB4C5D5E99
gpg: verschl├╝sselt mit cv25519 Schl├╝ssel, ID B783A823C750557F, erzeugt 2023-06-12
      "ldata-test"
gpg: AES256.OCB verschl├╝sselte Daten
gpg: Urspr├╝nglicher Dateiname='5g.tar'
gpg: ├ľffentlicher Schl├╝ssel ist F0E9CFDB4C5D5E99
gpg: der Unterschl├╝ssel F0E9CFDB4C5D5E99 wird anstelle des Hauptschl├╝ssels D6086C1E3CABA7FC verwendet
gpg: verschl├╝sselt mit cv25519 Schl├╝ssel, ID F0E9CFDB4C5D5E99, erzeugt 2023-06-26
      "cbc-test"
gpg: AES256.CFB verschl├╝sselte Daten
gpg: Urspr├╝nglicher Dateiname='5g.tar'
$ gpg --list-options show-pref -k  FFB6A9DC972C60D4
[...]
            S9 S8 S7 S2 A2 H10 H9 H8 H11 H2 Z2 Z3 Z1 [mdc] [aead] [no-ks-modify]

The [aead] indicates that OCB mode can be used. There is also a variant of the option which spells the flags out:

$ gpg --list-options show-pref-verbose -k  FFB6A9DC972C60D4
[...]

Yes.

I looked through the code. What I observed is:

  • By jussi's improvements, AEAD code is optimized with AEAD_ENC_BUFFER_SIZE of 64KiB
    • this contributes much for better performance
  • If we invoke gpg --sign | gpg --encrypt then we can take advantage of multiple CPUs (but gpg is currently not automatically threaded in that way)
    • signing could be improved likewise, using larger buffer like 64KiB
  • CFB+MDC, it uses two functions together; encryption and hashing, and not with larger buffer like 64KiB
    • when signed, it also does hashing for signing, so three functions

OK. I'll take the signing part (possible performance improvement).

I observed the benchmark by libgcrypt (Windows emulation 32-bit on Debian):

SHA256         |      4.30 ns/B     221.6 MiB/s         - c/B
SHA384         |      4.85 ns/B     196.7 MiB/s         - c/B
SHA512         |      4.84 ns/B     197.0 MiB/s         - c/B
...
AES            |  nanosecs/byte   mebibytes/sec   cycles/byte
       ECB enc |     0.146 ns/B      6520 MiB/s         - c/B
       ECB dec |     0.146 ns/B      6518 MiB/s         - c/B
       CBC enc |     0.605 ns/B      1576 MiB/s         - c/B
       CBC dec |     0.145 ns/B      6562 MiB/s         - c/B
       CFB enc |     0.597 ns/B      1596 MiB/s         - c/B
       CFB dec |     0.146 ns/B      6540 MiB/s         - c/B
       OFB enc |     0.705 ns/B      1352 MiB/s         - c/B
       OFB dec |     0.707 ns/B      1348 MiB/s         - c/B
       CTR enc |     0.146 ns/B      6530 MiB/s         - c/B
       CTR dec |     0.147 ns/B      6508 MiB/s         - c/B
       XTS enc |     0.152 ns/B      6280 MiB/s         - c/B
       XTS dec |     0.147 ns/B      6477 MiB/s         - c/B
       CCM enc |     0.746 ns/B      1279 MiB/s         - c/B
       CCM dec |     0.745 ns/B      1281 MiB/s         - c/B
      CCM auth |     0.598 ns/B      1595 MiB/s         - c/B
       EAX enc |     0.746 ns/B      1279 MiB/s         - c/B
       EAX dec |     0.744 ns/B      1282 MiB/s         - c/B
      EAX auth |     0.598 ns/B      1593 MiB/s         - c/B
       GCM enc |     0.264 ns/B      3615 MiB/s         - c/B
       GCM dec |     0.262 ns/B      3637 MiB/s         - c/B
      GCM auth |     0.117 ns/B      8160 MiB/s         - c/B
       OCB enc |     0.147 ns/B      6479 MiB/s         - c/B
       OCB dec |     0.144 ns/B      6637 MiB/s         - c/B
      OCB auth |     0.147 ns/B      6495 MiB/s         - c/B
...

Windows emulation 64-bit is like:

 SHA256         |      1.82 ns/B     523.2 MiB/s         - c/B
 SHA384         |      1.34 ns/B     713.5 MiB/s         - c/B
 SHA512         |      1.27 ns/B     753.4 MiB/s         - c/B
 ...
 AES            |  nanosecs/byte   mebibytes/sec   cycles/byte
        ECB enc |     0.149 ns/B      6387 MiB/s         - c/B
        ECB dec |     0.151 ns/B      6334 MiB/s         - c/B
        CBC enc |     0.583 ns/B      1635 MiB/s         - c/B
        CBC dec |     0.142 ns/B      6694 MiB/s         - c/B
        CFB enc |     0.600 ns/B      1589 MiB/s         - c/B
        CFB dec |     0.144 ns/B      6641 MiB/s         - c/B
        OFB enc |     0.671 ns/B      1422 MiB/s         - c/B
        OFB dec |     0.667 ns/B      1429 MiB/s         - c/B
        CTR enc |     0.129 ns/B      7400 MiB/s         - c/B
        CTR dec |     0.142 ns/B      6703 MiB/s         - c/B
        XTS enc |     0.156 ns/B      6095 MiB/s         - c/B
        XTS dec |     0.142 ns/B      6698 MiB/s         - c/B
        CCM enc |     0.747 ns/B      1276 MiB/s         - c/B
        CCM dec |     0.679 ns/B      1404 MiB/s         - c/B
       CCM auth |     0.583 ns/B      1636 MiB/s         - c/B
        EAX enc |     0.681 ns/B      1400 MiB/s         - c/B
        EAX dec |     0.730 ns/B      1307 MiB/s         - c/B
       EAX auth |     0.588 ns/B      1623 MiB/s         - c/B
        GCM enc |     0.246 ns/B      3874 MiB/s         - c/B
        GCM dec |     0.289 ns/B      3305 MiB/s         - c/B
       GCM auth |     0.094 ns/B     10113 MiB/s         - c/B
        OCB enc |     0.181 ns/B      5270 MiB/s         - c/B
        OCB dec |     0.148 ns/B      6464 MiB/s         - c/B
       OCB auth |     0.170 ns/B      5602 MiB/s         - c/B
...

Encryption looks similar. Hashing is relatively slow on 32-bit. I used same machine both for 32-bit and 64-bit.

For comparison here are my benchmark results on the same system. Once on Linux 64 bit and once on W32 and once with W64. All runs are native to exclude any virtualization issues.

Apologies for the formatting. I had some struggles with libreoffice decimal point.

I found this important to do because the SHA-256 results from your test looked extremly slow.

gniibe mentioned this in Unknown Object (Event).Jul 10 2023, 9:13 AM

Problem with SHA-256 on x86-64 is that it took long time for Intel to introduce SHA acceleration (SHA1 & SHA256) to their main CPU products.

These x86 CPUs have SHA256 acceleration:

  • AMD Ryzen starting with Zen (~2017)
  • Intel Core starting with 11th gen (~2020)
  • Intel Atom starting with Goldmont (~2016)

@jukivili Good to know.

Btw. for you as background. In Gpg4win we always had the situation that all your performance improvements were irrelevant because our whole IO system was so inefficient shoveling data from Kleopatras Qt implementations through GPGME back to GnuPG that our internal IO was always the boundary. With the last release we started to change our architecture that we now can bypass all that and just pass filenames. This enables us to start to see actual GnuPG performance in Gpg4win.

I still believe there is something to optimize there as we are far far from the crypto boundaries measured in bench-slope and even AES.OCB only reaches about 20-30% of my possible disk IO. My disk can read / write with about 5000-6500MiB/s my CPU can encrypt with libgcrypt at 17500 MiB/s and I still "only" reach 1000MiB/s when doing straight AES.OCB OpenPGP encrypt without compression. So we either have too much overhead in GnuPG or a bottleneck that I do not yet see.

Do not get me wrong. We are _fast_ but I love performance and getting even better is fun and also good for the environment ;-)

Just started wondering how much of this slow down is because of MingW libc not having very well optimized memcpy/memmove/memchr/strlen/etc. Is there profiling tools like 'perf' on Linux that could be used for Windows builds?