Page MenuHome GnuPG

gpg takes > 30s to list the keys from a 17MiB `pubring.gpg` that contains a single certificate
Closed, ResolvedPublic

Description

As i mentioned in T4591, modern gpg normally doesn't want to load a super-large certificate (the only cases i've seen certificates that large is when they are the victim of a certificate flooding attack).

But they can still import a large certificate if they use the old-style keyring format (instead of keybox), and they might have just inherited them from a previous installation.

When such a certificate is present, it causes a massive slowdown. 33 seconds on a fast, modern machine just to list the certificate itself! I recompiled the head of STABLE-BRANCH-2-2 of GnuPG with -pg and profiled with gprof:

0 dkg@alice:/tmp/cdtemp.7QJ3xD$ time ~/src/gnupg/gnupg2/g10/gpg --list-keys
gpg: NOTE: THIS IS A DEVELOPMENT VERSION!
gpg: It is only intended for test purposes and should NOT be
gpg: used in a production environment or with production keys!
/tmp/cdtemp.7QJ3xD/pubring.gpg
------------------------------
pub   ed25519 2019-01-19 [C] [expires: 2021-01-18]
      C4BC2DDB38CCE96485EBE9C2F20691179038E5C6
uid           [ unknown] Daniel Kahn Gillmor <dkg@fifthhorseman.net>
uid           [ unknown] Daniel Kahn Gillmor <dkg@debian.org>
sub   ed25519 2019-01-19 [S] [expires: 2020-01-19]
sub   ed25519 2019-01-19 [A] [expires: 2020-01-19]
sub   cv25519 2019-01-19 [E] [expires: 2020-01-19]


real	0m33.610s
user	0m33.529s
sys	0m0.081s
0 dkg@alice:/tmp/cdtemp.7QJ3xD$ gprof ~/src/gnupg/gnupg2/g10/gpg > list-keys.gprof.txt

Attached are the statistics of that run:


even deleting the key is computationally expensive, taking more than 1 minute to delete the only certificate in the keyring, yielding this profiling data:


I don't know how to read gprof output particularly well, but both of these files point toward add_kb_node as the biggest culprit.

One thing that i observe is that just parsing the 17MiB of packets is *not* expensive with gpg generally:

0 dkg@alice:/tmp/cdtemp.7QJ3xD$ time gpg --list-packets < example.key > /dev/null

real	0m0.513s
user	0m0.500s
sys	0m0.013s
0 dkg@alice:/tmp/cdtemp.7QJ3xD$

Details

Version
2.2.16

Revisions and Commits

Event Timeline

werner edited projects, added gnupg (gpg23); removed gnupg (gpg22).
werner added a subscriber: werner.

We know that. The problem is that we can't simply switch to sqlite for key storage because it is common that dozens of gpg processes are accessing the key data base. At least at some points we need proper transactional behaviour and Sqlite implements that by talking a temporary copy of the database - not an option for large keyrings.

We have a plan and already some code for 2.3 to fix that. This is why I set the priority to normal.

I didn't mean to suggest that switching to sqlite was the only way to fix this, but if it is a promising way to fix it, that would be great. I'm sure there are other ways.

Please help me understand why this is not an option for large keyrings. Is it a performance concern? something else?

The current behavior appears to be pathologically bad for arbitrary certificates for anyone, even for people who only access the keyring for reading, and who do it without contention. The above read-only --list-keys test was done without any contention for the keyring and it still took 30s to produce < 1KiB of output from a 17MiB file.

Verifying a git tag from the "clean" version of this certificate takes ~225ms of CPU time. Verifying the same git tag from a keyring that contains the flooded version of the certificate takes ~145s. This is factor of more than 600×. Any automated git tag verification system can probably be DoSed by this behavior.

Just importing a ~666KiB certificate when this monster certificate is in the keyring consumes over 10m of CPU time:

0 dkg@alice:/tmp/cdtemp.7QJ3xD/monkeysphere$ time GNUPGHOME= gpg --export  CCD2ED94D21739E9  | gpg --import
gpg: key CCD2ED94D21739E9: 1391 signatures not checked due to missing keys
gpg: key CCD2ED94D21739E9: public key "Daniel Kahn Gillmor <dkg@fifthhorseman.net>" imported
gpg: Total number processed: 1
gpg:               imported: 1
gpg: no ultimately trusted keys found

real	10m28.703s
user	7m20.467s
sys	3m8.147s
0 dkg@alice:/tmp/cdtemp.7QJ3xD/monkeysphere$

@gniibe: I doubt that your fix really makes a difference. The majority of time is spend on searching the keyring for keys. This is why I have the gpgk thing in the works.

@gniibe -- thank you very much for tracking down these O(N^2) operations and cleaning them up. I will profile the effect of those changes and report my findings.

(i think that rG33c17a8008c3ba3bb740069f9f97c7467f156b54 is also relevant, though it was not tagged with this ticket)

@werner : Yes, the way to go is having something like a server for keys; It can remove all unnecessary search/lookup all together.

My change is a nit piking in maintenance (not new feature). It is minor, but, it can speed up the final function after lookup (a bit).

For the particular problem of --list-key with pubring.gpg, I think we can say it's fixed.

fwiw, i tried gpg --import on the ascii-armored version of my C4BC2DDB38CCE96485EBE9C2F20691179038E5C6 OpenPGP certificate (22895014 octets, 54614 certifications), followed by gpg --list-keys and gpg --export | wc. I was comparing 2.2.17-1 (from the debian package in unstable) with the exact same source, just with @gniibe's two patches rG33c17a8008c3 and rGa7a043e82555 applied as well. I did this with GNUPGHOME set to an otherwise empty directory, where i had done touch pubring.gpg to avoid the keybox format. (the two runs did not share a GNUPGHOME).

Without @gniibe's patches:

$ time gpg --import < ../flooded.key
[…]
real	11m4.620s
user	8m4.156s
sys	2m58.155s
$ time gpg --list-keys
[…]
real	0m34.855s
user	0m34.781s
sys	0m0.069s
$ time gpg --export | wc
 116411  424952 16934775

real	0m37.866s
user	0m37.864s
sys	0m0.097s
$

but with @gniibe's patches:

$ time gpg --import < flooded.key
[…]
real	8m53.788s
user	5m50.769s
sys	2m58.636s
$ time gpg --list-keys
[…]
real	0m0.178s
user	0m0.151s
sys	0m0.028s
$ time gpg --export | wc
 116411  424952 16934775

real	0m1.801s
user	0m1.834s
sys	0m0.076s
$

So there is still some significant delay in --import, but @gniibe's patches even shave off ~25% of the CPU time in that case. And they remove a significantly more for CPU usage for --list-keys and --export (apparently > 90% of the CPU time).

Is there any reason that we shouldn't apply these patches to the STABLE-BRANCH-2-2 as well? The tests all appear to pass.

i also checked the CPU time for git tag -v, whether @gniibe's patches were applied or not.

without the patches:

real	2m16.258s
user	2m16.114s
sys	0m0.143s

with the patches:

real	0m0.680s
user	0m0.598s
sys	0m0.083s

So this is a huge win.

Okay, for 100000 signature this is clearly a win if no key lookup is needed.

with @gniibe's patches applied, i profiled the --import, since that is where the largest CPU cost remains. I tried two different times:

  • importing into an empty pubring.gpg, which cost ~10 minutes of CPU time:
Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
 23.63     74.27    74.27 1671939985     0.00     0.00  parse
 22.16    143.93    69.66   191186     0.00     0.00  check_signature_over_key_or_uid
 21.19    210.55    66.62 2983453616     0.00     0.00  walk_kbnode
 21.15    277.03    66.48    54621     0.00     0.00  find_prev_kbnode
  4.60    291.49    14.46 1672049220     0.00     0.00  iobuf_skip_rest
  2.74    300.10     8.61   273083     0.00     0.00  dbg_search_packet
  0.79    302.59     2.49        1     2.49     2.49  iobuf_get_fd

in this case, the 2.9G invocations of walk_kbnode is almost exactly 54614^2, so i suspect that another O(N^2) operation exists there.

  • importing into the same keyring a second time (with the certificate already present), which still costs about 30 seconds:
Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
 83.01     15.73    15.73        1    15.73    18.86  import_one_real
 12.98     18.19     2.46 745754172     0.00     0.00  cmp_signatures
  0.74     18.33     0.14       13     0.01     0.01  cmp_public_keys
  0.69     18.46     0.13     2069     0.00     0.00  radix64_read
  0.69     18.59     0.13   163900     0.00     0.00  read_rest
  0.42     18.67     0.08   109244     0.00     0.00  sig_comparison
  0.26     18.72     0.05   313643     0.00     0.00  iobuf_read_line

The speed here is much better, but still 30s to import something that we already have is expensive. And 745M signature comparisons seems much more than there would be if the signatures were well-indexed.

A linked list of 100000 items is not a usable data structure. The problem however is not the linked list but the DoS due to the number of signatures being well beyond the design limit. 1000 key signatures is already a large number and only few people have them. We need to put a limit on them.

About importing, there are two other works: repairing and trustdb update. We can figure out the difference by the --import-options of no-repair-keys and fast-import (to skip those works).
I think that both can be O(N^2) for number of signatures.

@gniibe, the documentation (at least on the stable branch) says that --fast-import is just a synonym for --import. is that incorrect?

@werner, I agree that having some limits put in place is not unreasonable, but i wonder how we manage those limits, especially over time and as different certifications come to light. If i encounter 500 certifications for a given certificate today, and a different set of 500 certifications tomorrow, and i'm only willing to retain, say, 250 certifications, how should i decide which ones to keep? is there some ranking or prioritization? or just first-come, first-served? These are pretty complicated tradeoffs, and potentially ultimately user-interface questions. and it would be good to try to document how we think they should work, even if we haven't got them implemented yet.

I am proposing to backport rG33c17a8008c3ba3bb740069f9f97c7467f156b54 and rGa7a043e82555a9da984c6fb01bfec4990d904690 to STABLE-BRANCH-2-2 as they represent a significant performance improvement in several specific use cases and appear to have no downsides.

Is there any reason to not do that?

I thought I close this after the release of 2.2.18.
Anway, it's done, so, closing.