--list-sigs on a keybox is extremely slow
Open, NormalPublic

Description

Using the /tmp/gnupg{1,2} from
http://lists.gnupg.org/pipermail/gnupg-devel/2015-February/029541.html:

$ strace -co list-key1 gpg --homedir /tmp/gnupg1 --list-key 0x39278DA8109E6244

/dev/null

$ strace -co list-sigs1 gpg --homedir /tmp/gnupg1 --list-sigs

0x39278DA8109E6244 >/dev/null

$ strace -co list-key2 gpg2 --homedir /tmp/gnupg2 --list-key

0x39278DA8109E6244 >/dev/null

$ strace -co list-sigs2 gpg2 --homedir /tmp/gnupg2 --list-sigs

0x39278DA8109E6244 >/dev/null

$ strace -co list-key2\' gpg2 --homedir /tmp/gnupg1 --list-key

0x39278DA8109E6244 >/dev/null

$ strace -co list-sigs2\' gpg2 --homedir /tmp/gnupg1 --list-sigs

0x39278DA8109E6244 >/dev/null

$ head -5 list-*

> list-key1 <

% time seconds usecs/call calls errors syscall


52.40 0.001047 0 9753 read
18.12 0.000362 0 3264 getrusage
11.66 0.000233 0 3264 clock_gettime

> list-key2 <

% time seconds usecs/call calls errors syscall


97.99 0.015240 8 1986 read

  1.76    0.000273           9        31           munmap
  0.25    0.000039          20         2           prlimit64

> list-key2' <

% time seconds usecs/call calls errors syscall


95.36 0.000863 0 9756 read

  4.64    0.000042           6         7           munmap
  0.00    0.000000           0         1           write

> list-sigs1 <

% time seconds usecs/call calls errors syscall


53.32 0.131142 0 1077208 read
12.80 0.031481 0 316730 gettimeofday
11.42 0.028100 0 316730 getrusage

> list-sigs2 <

% time seconds usecs/call calls errors syscall


99.35 3.614892 4 924097 read

  0.61    0.022344           6      3621           munmap
  0.02    0.000644           0      3651           mmap2

> list-sigs2' <

% time seconds usecs/call calls errors syscall


99.60 0.109905 0 1077211 read

  0.29    0.000316           0      6262           time
  0.07    0.000078          10         8           munmap

So oddly enough gpg doesn't do more reads on a keybox, but these reads tend to
be much slower.

guilhem set Version to 2.1.2.
guilhem set External Link to http://lists.gnupg.org/pipermail/gnupg-devel/2015-February/029541.html.
guilhem added a subscriber: guilhem.
dkg added a subscriber: dkg.Apr 1 2015, 9:40 PM

I'm seeing this behavior as well. For a test keyring with 13 keys, 49 User
IDs, and 7227 signatures, stored on a tmpfs on linux kernel 3.16, gpg2
--list-sigs with a pubring.kbx takes over 3 seconds (mostly kernelspace), but
with an old-style pubring.gpg takes 0.6 seconds (mostly userspace)

Maybe it makes more sense to mmap the keybox rather than trying to seek and read
inside it?

werner claimed this task.Apr 4 2015, 11:11 AM
werner added a subscriber: werner.

It is some kind of regression - I'll look into it soon.

werner added a comment.Apr 6 2015, 9:23 PM

By keeping track of not-found user ids I was able to to reduce the time for
--list-sigs on my test setting from more than 5 minutes to 50 seconds. To
properly implement this I need to write a more generic caching mechanism so that
it is only used during key listing but not using import. Using a hash table
instead of simple list will also reduce the lookup time a little bit.

It all depends on the order the keys are stored in the pubring.*, how many keys
are missing and on whether the cache must be flushed.

You might want to apply the attached patch and in another test configure gnupg with
--enable-key-cache=100000
or so and test whether this speeds it up for you.

Indeed, your patch speeds up the listing, but doesn't seem to fully solve the
regression and in particular doesn't make the keybox setting as fast as the
legacy pubring.gpg one.

Here is my test environement:

  • fresh identical keyrings /tmp/gnupg1 and /tmp/gnupg2, the former with a

pubring.gpg, the latter with a pubring.kbx.

  • 1246 public keys: `gpg2 --homedir /tmp/gnupg1 --with-colons --list-sigs |

grep -c '^pub:'`

  • 228176 sigs: `gpg2 --homedir /tmp/gnupg1 --with-colons --list-sigs | grep -Ec

'^(sig|rev):'`

  • 156625 sigs from an unknown signer: `gpg2 --homedir /tmp/gnupg1 --with-colons

--list-sigs | grep -Ec '^(sig|rev):([^:]*:){8}\[User ID not found\]:'`

  • 13754 unknown signers: `gpg2 --homedir /tmp/gnupg1 --with-colons --list-sigs
sed -nr 's/(sigrev):([^:]*:){3}([0-9A-F]{16}):([^:]*:){4}\[User ID not

found\]:.*/\3/p' | sort -u | wc -l`

  • g10/gpg2 is the current HEAD (6619ead) with your patch on top; gpg2 is

2.1.2; gpg is 1.4.18.

  • $keyID1: 216 sigs (98 unique signers), 178 from an unknown signer (87 unique

unknown signers)

`gpg --homedir /tmp/gnupg1 --list-sigs $keyID1`:      0:07.17 (5.24 user,

1.82 sys) 7956k maxres

`gpg2 --homedir /tmp/gnupg1 --list-sigs $keyID1`:     0:07.62 (5.47 user,

1.53 sys) 8312k maxres

`gpg2 --homedir /tmp/gnupg2 --list-sigs $keyID1`:     0:58.27 (3.74 user,

51.69 sys) 11836k maxres

`g10/gpg2 --homedir /tmp/gnupg1 --list-sigs $keyID1`: 0:06.71 (5.03 user,

1.58 sys) 8224k maxres

`g10/gpg2 --homedir /tmp/gnupg2 --list-sigs $keyID1`: 0:17.03 (1.50 user,

14.92 sys) 11812k maxres

  • $keyID2: 1026 sigs (250 unique signers), 716 from an unknown signer (181

unique unknown signers)

`gpg --homedir /tmp/gnupg1 --list-sigs $keyID2`:      0:07.29 (5.33 user,

1.86 sys) 7940k maxres

`gpg2 --homedir /tmp/gnupg1 --list-sigs $keyID2`:     0:07.53 (5.68 user,

1.64 sys) 8256k maxres

`gpg2 --homedir /tmp/gnupg2 --list-sigs $keyID2`:     1:01.09 (3.59 user,

53.18 sys) 11784k maxres

`g10/gpg2 --homedir /tmp/gnupg1 --list-sigs $keyID2`: 0:06.66 (4.98 user,

1.55 sys) 8052k maxres

`g10/gpg2 --homedir /tmp/gnupg2 --list-sigs $keyID2`: 0:17.11 (1.50 user,

15.16 sys) 11812k maxres

  • $keyID3: 1334 sigs (302 unique signers), 374 from an unknown signer (80 unique

unknown signers)

`gpg --homedir /tmp/gnupg1 --list-sigs $keyID3`:      0:18.31 (12.18 user,

5.71 sys) 8928k maxres

`gpg2 --homedir /tmp/gnupg1 --list-sigs $keyID3`:     0:17.73 (12.80 user,

4.68 sys) 9380k maxres

`gpg2 --homedir /tmp/gnupg2 --list-sigs $keyID3`:     0:40.33 ( 3.70 user,

34.58 sys) 14576k maxres

`g10/gpg2 --homedir /tmp/gnupg1 --list-sigs $keyID3`: 0:16.38 (11.03 user,

4.85 sys) 9280k maxres

`g10/gpg2 --homedir /tmp/gnupg2 --list-sigs $keyID3`: 0:17.83 ( 2.16 user,

13.79 sys) 14612k maxres

  • $keyID4: 1439 sigs (643 unique signers), 1168 from an unknown signer (555

unique unknown signers)

`gpg --homedir /tmp/gnupg1 --list-sigs $keyID4`:      0:08.94 (6.64 user,

2.22 sys) 7952k maxres

`gpg2 --homedir /tmp/gnupg1 --list-sigs $keyID4`:     0:09.61 (7.26 user,

1.93 sys) 8444k maxres

`gpg2 --homedir /tmp/gnupg2 --list-sigs $keyID4`:     1:34.20 (5.98 user,

85.03 sys) 11724k maxres

`g10/gpg2 --homedir /tmp/gnupg1 --list-sigs $keyID4`: 0:08.50 (6.37 user,

1.94 sys) 8092k maxres

`g10/gpg2 --homedir /tmp/gnupg2 --list-sigs $keyID4`: 0:50.71 (3.20 user,

44.63 sys) 11568k maxres

Furthermore, the patch brings down --list-sigs on the entire keybox to 17min
from 2h25 (vs 2min on a pubring.gpg).

So all in a all, while this patch significantly improves the slowness of
--list-sigs with a keybox, it doesn't generally (except on $keyID3) make it as
fast as when used with the legacy pubring.gpg format.

Also, I'd like to point out that since the nested lookups in get_user_id are
only meant to retrieve the signer's primary UID, and as it seems to be part of
the keybox's metadata (if I got T1939 right, that's why kbxutil is lightning
fast), it should be possible avoid the parsing (which I guess is causing most of
the overhead) to get the UID right? Of course it can't hurt to have a caching
mechanism on top of that, though ;-)

werner added a comment.Apr 9 2015, 1:56 PM

I have no immediate idea why 1.4 is faster than 2.1. To sort out whether it is
a problem of the gnupg version or of the keyring format you may replace
pubring.kbx in gnupg2 by pubring.gpg from gnupg1 and check whether there is a
substantial difference between the evrsions using the same pubring.gpg.

Given the number of keys the default cache sizes should work.

I am not testing with 2.1.2 but with master which might have some updates. With
-fast-list-mode --lits-sigs is really fast.

Taking the primary user id from the keybox meta data is not a good idea because
it violates the layering.

Aren't the second tests (gpg2 --homedir /tmp/gnupg1 --list-sigs) precisely
what you want? This suggests that the bug is triggered by get_user_id, but
only when used on a keybox.

Using master (6619ead) without your patch yields timings comparable to 2.1.2
with both pubring.kbx and pubring.gpg. As I hinted at in the ExtLink, I also
noticed that adding --fast-list-mode makes --list-sigs as fast as --list-keys
(which is not surprising, since it removes the quadratic behavior). Since it
also doesn't show any significant difference between pubring.gpg and
pubring.kbx, it confirms that the bug is triggered by get_user_id when used on
a keybox (for some reason read(2) seems to be much much more expensive on a
keybox than on the legacy format).

guilhem changed Version from 2.1.2 to 6619ead.Apr 9 2015, 8:48 PM

The attached callgraph for g10/gpg2 --homedir /tmp/gnupg1 --list-sigs $keyID4
(without the patch) shows that a significant amount of time is spent
fread(3)'ing each blob of the keybox (for every single sig encountered).

A caching mechanism (ideally two hashtables, the first mapping 64-bits key IDs
to their primary UID, the second keeping track of unknown signer's 64-bits key
IDs) would certainly speed up things, but as shown in T1938 (guilhem on Apr 01 2015, 04:12 PM / Roundup) read(2) causes
overhead on a keybox for *both* --list-keys and --list-sigs, so it wouldn't
solve the core of the issue. I would therefore agree with dkg's T1938 (dkg on Apr 01 2015, 09:40 PM / Roundup) here:
it's probably worth trying to mmap(3) the keybox instead of fread(3)'ing it.

The attached callgraph for g10/gpg2 --homedir /tmp/gnupg1 --list-sigs $keyID4

Of course it should be g10/gpg2 --homedir /tmp/gnupg2 --list-sigs $keyID4.
(/tmp/gnupg1 doesn't contain a keybox.) Sorry for the confusion.

I now keep a list of not found keyids in the keydb modules, thus all lookups by
keyid will benefit from it. It also works for keybox and keyring. My tests
shows a speedup from 28 minutes to 3 minutes. That test with 14000 not
available keys requires about 220 KiB exra RAM, which should not be a problem.
See commit 6500f33.

werner closed this task as Resolved.Jun 20 2015, 3:13 PM
werner removed a project: In Progress.
werner added a project: gnupg.
guilhem reopened this task as Open.Aug 1 2015, 6:05 PM
guilhem added a project: In Progress.

Thanks for trying to fix that bug, Werner! However, while your patch does
indeed make 2.1.6 quite a bit faster than 2.1.5, it's still much much slower
than the venarable 1.4.19. And since with a keybox 2.1.x spends most of its
time doing syscalls, I believe there is something odd in the way it reads
keyboxes hence I'm reopening the issue.

With an x60s (i686, 2x Core Duo L2400 @ 1.66GHz, 3GiB DDR2 PC2-5300):

  $ time gpg --homedir /run/shm/gnupg1 --list-sigs >/dev/null
  2:16.41 (91.60 user, 43.06 sys)  9444k maxres
  $ time gpg2 --homedir /run/shm/gnupg1 --list-sigs >/dev/null
  2:14.45 (97.77 user, 35.41 sys)  10364k maxres
  $ time gpg2 --homedir /run/shm/gnupg2 --list-sigs >/dev/null
  19:40.61 (80.14 user, 1088.58 sys)  15016k maxres

With a desktop (x86_64, 4x Core i7-4770S @ 3.1GHz, 16GiB DDR3 PC3-12800):

  $ time gpg --homedir /run/shm/gnupg1 --list-sigs >/dev/null
  0:30.74 (24.32 user, 6.40 sys)  11700k maxres
  $ time gpg2 --homedir /run/shm/gnupg1 --list-sigs >/dev/null
  0:26.34 (20.24 user, 6.08 sys)  13132k maxres
  $ time gpg2 --homedir /run/shm/gnupg2 --list-sigs >/dev/null
  2:23.97 (14.39 user, 129.52 sys)  18800k maxres

(On both machines, '/run/shm' is mounted in memory. '/run/shm/gnupg1' contains a
keyring in legacy format, while '/run/shm/gnupg2' contains the same keyring in
keybox format.)

So on the fast machine, --list-sigs is still 5x slower on a keybox. Note the
proportion of the execution spent in kernel mode: 93% on the L2400, 90% on the
i7! (Vs. 26% and 23% with a legacy .gpg format.) I have also a hunch that this
might be the reason why updating the trustdb takes that long with 2.1.x and a
keybox.

Sure it does syscalls because it reads the keybox file for each signature. The
order of the keys in the keybox can make a big difference when checksing
signatures. Arge the keys store in the same order? "gpg -k" walks the
keyring/keybox sequentially.

Yes they are, the output of `$GNUPGBIN --homedir $GNUPGHOME --with-fingerprint
--with-colons --list-key | grep ^fpr` is identical for both (GNUPGBIN=gpg,
GNUPGHOME=/run/shm/gnupg1) and (GNUPGBIN=gpg2, GNUPGHOME=/run/shm/gnupg2).

Also, the keystores were created with

$ mkdir -m0700 /run/shm/gnupg{1,2}

$ echo 'trust-model always' | tee /run/shm/gnupg1/gpg.conf

/run/shm/gnupg2/gpg.conf

  $ echo no-autostart >>/run/shm/gnupg2/gpg.conf
  $ gpg2 --export | gpg  --homedir /run/shm/gnupg1 --import
  $ gpg2 --export | gpg2 --homedir /run/shm/gnupg2 --import

(Each keystore currently contains 1291 public keys and 235302 signatures, 161385
of which from unknown signers.)

Actually 2.1.6 on the keybox does about 4x more read(2) than 1.4.19 or 2.1.6 on
the legacy format.
Here are some figures using the i7:

$ strace -co /tmp/sigs_gpg_gnupg1 gpg --homedir /run/shm/gnupg1 --with-colons

--list-sigs >/dev/null

$ strace -co /tmp/sigs_gpg2_gnupg1 gpg2 --homedir /run/shm/gnupg1

--with-colons --list-sigs >/dev/null

$ strace -co /tmp/sigs_gpg2_gnupg2 gpg2 --homedir /run/shm/gnupg2

--with-colons --list-sigs >/dev/null

  $ head -5 /tmp/sigs_*
  ==> /tmp/sigs_gpg_gnupg1 <==
  % time     seconds  usecs/call     calls    errors syscall
  ------ ----------- ----------- --------- --------- ----------------
   71.05    0.010748           0   7480899           read
   15.64    0.002366           0   2233564           getrusage
   12.94    0.001958           0   2233564           clock_gettime

  ==> /tmp/sigs_gpg2_gnupg1 <==
  % time     seconds  usecs/call     calls    errors syscall
  ------ ----------- ----------- --------- --------- ----------------
   99.92    0.010204           0   7480899           read
    0.04    0.000004           0      4608           write
    0.04    0.000004           0     20438           lseek

  ==> /tmp/sigs_gpg2_gnupg2 <==
  % time     seconds  usecs/call     calls    errors syscall
  ------ ----------- ----------- --------- --------- ----------------
   99.94    1.724001           0  29610764           read
    0.02    0.000386           0     15307         9 open
    0.02    0.000376           0     15290           munmap

Interestingly, with --list-keys 2.1.6 is actually faster on /run/shm/gnupg2:

  $ time gpg --homedir /run/shm/gnupg1 --with-colons --list-keys >/dev/null
  0:02.01 (1.99 user, 0.02 sys)  11004k maxres
  $ time gpg2 --homedir /run/shm/gnupg1 --with-colons --list-keys >/dev/null
  0:02.03 (2.01 user, 0.02 sys)  11456k maxres
  $ time gpg2 --homedir /run/shm/gnupg2 --with-colons --list-keys >/dev/null
  0:00.60 (0.58 user, 0.02 sys)  16568k maxres

It also does about 10x less read(2):

$ head -5 /tmp/keys*

==> /tmp/keys_gpg2_gnupg1 <==

% time     seconds  usecs/call     calls    errors syscall

------ ----------- ----------- --------- --------- ----------------

100.00    0.000004           0     21011           read

  0.00    0.000000           0       166           write

  0.00    0.000000           0        20         3 open



==> /tmp/keys_gpg2_gnupg2 <==

% time     seconds  usecs/call     calls    errors syscall

------ ----------- ----------- --------- --------- ----------------

  0.00    0.000000           0      2036           read

  0.00    0.000000           0       166           write

  0.00    0.000000           0        19         3 open
marcus added a subscriber: marcus.Jul 5 2017, 2:17 PM

It's unclear from the discussion if this issue has been resolved. @werner can you please comment on this?

In T1938#99890, @marcus wrote:

It's unclear from the discussion if this issue has been resolved.

With the 1663 keys in my keyring --list-sigs still takes over 5x longer with 2.1.18 on a keybox than on a pubring.gpg; 1.4.21 is slightly slower but comparable.

$ mkdir -m0700 $XDG_RUNTIME_DIR/{gnupg1,gnupg2}
$ echo 'trust-model always' | tee $XDG_RUNTIME_DIR/{gnupg1,gnupg2}/gpg.conf
$ echo no-autostart >>$XDG_RUNTIME_DIR/gnupg2/gpg.conf
$ gpg1 --no-options --version | head -1
gpg (GnuPG) 1.4.21
$ gpg --no-options --version | head -2
gpg (GnuPG) 2.1.18 
libgcrypt 1.7.8
$ gpg --export >"$XDG_RUNTIME_DIR/pubring.gpg"
$ stat --printf="%s\n" "$XDG_RUNTIME_DIR/pubring.gpg"
107489800
$ alias time="/usr/bin/time -f '%E (%U user, %S sys)  %Mk maxres'"
$ time gpg1 --homedir "$XDG_RUNTIME_DIR/gnupg1" --import <"$XDG_RUNTIME_DIR/pubring.gpg"
[…]
gpg: Total number processed: 1663
gpg:           w/o user IDs: 6
gpg:               imported: 1657  (RSA: 1375)
Command exited with non-zero status 2
2:12.50 (82.08 user, 50.38 sys)  19144k maxres
$ time gpg --homedir "$XDG_RUNTIME_DIR/gnupg2" --import <"$XDG_RUNTIME_DIR/pubring.gpg"
[…]
gpg: Total number processed: 1663
gpg:               imported: 1663
1:37.69 (18.67 user, 78.97 sys)  28632k maxres
$ time gpg1 --homedir "$XDG_RUNTIME_DIR/gnupg1" --with-colons --fixed-list-mode --list-sigs >/dev/null
0:53.75 (41.13 user, 12.62 sys)  11920k maxres
$ time gpg --homedir "$XDG_RUNTIME_DIR/gnupg2" --with-colons --fixed-list-mode --list-sigs >/dev/null
4:14.64 (25.96 user, 228.67 sys)  20020k maxres
$ time gpg --homedir "$XDG_RUNTIME_DIR/gnupg1" --with-colons --fixed-list-mode --list-sigs >/dev/null
0:46.18 (34.95 user, 11.20 sys)  14968k maxres

We have to check what happens here, because list-sigs should be fast.

marcus edited projects, added Stalled; removed In Progress.Jul 10 2017, 10:23 AM