100% CPU usage endles loop of gpg --list-keys
Open, LowPublic

Description

Following command, called by Thunderbird (probably Enigmail plugin) hangs with 100% CPU usage endless loop:

/usr/bin/gpg --charset utf-8 --display-charset utf-8 --no-auto-check-trustdb --batch --no-tty --status-fd 2 --with-fingerprint --fixed-list-mode --with-colons --list-secret-keys 4E2C6E8793298290

After I've noticed above mentioned problem, I've killed related process. Then whenever I'm trying a simple list keys command, it is again hangin with 100% CPU usage endless loop:

gpg -k

My OS is Arch GNU/Linux with following uname:

Linux 4.16.7-1-ARCH #1 SMP PREEMPT Wed May 2 21:12:36 UTC 2018 x86_64 GNU/Linux
gpg (GnuPG) 2.2.6
libgcrypt 1.8.2

gpg comes from official package of Arch GNU/Linux repository, version 2.2.6-1.

Dependencies (with optional installed):

npth                : 1.5-1
libgpg-error                : 1.29-1
libgcrypt                : 1.8.2-1
libksba                : 1.3.5-1
libassuan                : 2.5.1-1
pinentry                : 1.1.0-2
bzip2                : 1.0.6-7
readline                : 7.0.003-1
gnutls                : 3.5.18-1
sqlite                : 3.23.1-1
libldap                : 2.4.46-1
libusb-compat                : 0.1.5-1
pcsclite                : 1.8.23-3

Details

Version
2.2.6
ktalik created this task.May 10 2018, 3:50 PM

This looks reminiscent of a bug previously seen in GPA (T3748).

Are you using the TOFU trust model?

I'm not sure. How to check it? In man gpg I only see instructions on how to change the trust model. ~/.gnupg/gpg.conf does not have any trust model related entry. I have ~/.gnupg/tofu.db file however.

If you never explicitly changed the default trust model, then I would expect you are not using TOFU, but the presence of a tofu.db file strongly suggests that you are indeed using it.

To know for sure, you may use the command gpg --check-trustdb. Look in the output of that command for a line similar to the following:

gpg: marginals needed: 3  completes needed: 1  trust model: XXX

If you see that line, then XXX is the trust model you are using (it should be pgp by default, but it could be tofu+pgp). If you don't see such a line, then you are using the tofu model.

Above command freezes with 100% CPU, too.

Okay, so maybe this has nothing to do with T3748 then…

Well, I can’t reproduce. Assuming you’re using TOFU (that must be the case since you have a tofu.db database), I wonder if your database may be somehow corrupted.

Could you try moving the ~/.gnupg/tofu.db file away (do not delete it! just move it under a different name and/or a different folder), and then try either a key listing command (that you know trigger the freeze) or a --check-trustdb command?

Good idea, but I've already tried it. Tried once again and freeze still occurs.

werner added a subscriber: werner.Jun 6 2018, 2:32 PM

Please add

--verbose --debug ipc,trust

to the gpg invocation.

werner triaged this task as Normal priority.Jun 6 2018, 2:33 PM
werner added a project: gnupg (gpg22).
ktalik added a comment.EditedJun 6 2018, 2:48 PM
$ gpg -k --verbose --debug ipc,trust
gpg: reading options from '/home/konrad/.gnupg/gpg.conf'
gpg: enabled debug flags: trust ipc
gpg: using pgp trust model
gpg: checking the trustdb
gpg: removing stale lockfile (created by 14064)
[FREEZE]
$ gpg --charset utf-8 --display-charset utf-8 --no-auto-check-trustdb --batch --no-tty --status-fd 2 --with-fingerprint --fixed-list-mode --with-colons --list-secret-keys 4E2C6E8793298290 --verbose --debug ipc,trust
gpg: Note: '--verbose' is not considered an option
gpg: Note: '--debug' is not considered an option
gpg: please do a --check-trustdb
[FREEZE]
$ gpg --debug ipc,trust --check-trustdb
gpg: reading options from '/home/konrad/.gnupg/gpg.conf'
gpg: enabled debug flags: trust ipc
gpg: removing stale lockfile (created by 14417)
[FREEZE]

In your second run you added the options after the argument (4E2C6E8793298290) so they won't have an effect. Anyway, I can't see anything from the output. My way to debug that would be to run gpg under strace:

strace -o gpg.trc gpg -k --verbose --debug ipc,trust

The hit ctrl-c and post the last 20 lines, unless they are all the same, then please search back until you find the start of that. Or upload the entire file but take care that you ran it in a test environment, because strace shows all kind of things you may not want to publish.

werner lowered the priority of this task from Normal to Low.Mar 18 2019, 7:22 PM

We can't replicate that and got no more response for 9 months.

werner closed this task as Resolved.May 15 2019, 9:41 AM
werner removed a project: Bug Report.
werner claimed this task.
werner reopened this task as Open.Jun 7 2019, 7:40 PM

I received an strace for a similar case by PM.

gusnan added a subscriber: gusnan.Jun 7 2019, 8:39 PM
jackalope added a subscriber: jackalope.EditedJun 26 2019, 9:51 PM

I've encountered the same problems that the original poster has described; the problems started suddenly on June 24, not prompted by any related updates as far as I can tell. The problems occur with both gpg 2.1.18 installed from the official Debian Stretch package and 2.2.12 installed from stretch-backports.

Results of uname: Linux 4.9.0-9-amd64 #1 SMP Debian 4.9.168-1+deb9u3 (2019-06-16) x86_64 GNU/Linux

Ran strace -o gpg.trc gpg -k --verbose --debug ipc,trust -- here's the onscreen output:

gpg: reading options from '/home/jack/.gnupg/gpg.conf'
gpg: enabled debug flags: trust ipc
gpg: using pgp trust model
gpg: checking the trustdb
gpg: removing stale lockfile (created by 2333)

Here are lines 180 through 215 of the resultant output:

open("/usr/share/locale/en_US.utf8/LC_MESSAGES/gnupg2.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale/en_US/LC_MESSAGES/gnupg2.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale/en.UTF-8/LC_MESSAGES/gnupg2.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale/en.utf8/LC_MESSAGES/gnupg2.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale/en/LC_MESSAGES/gnupg2.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
write(2, "gpg: reading options from '/home"..., 53) = 53
write(2, "'\n", 2)                      = 2
fstat(3, {st_mode=S_IFREG|0644, st_size=2997, ...}) = 0
read(3, "#\n# This is an implementation of"..., 4096) = 2997
read(3, "", 4096)                       = 0
close(3)                                = 0
write(2, "gpg: enabled debug flags: trust "..., 35) = 35
write(2, "\n", 1)                       = 1
access("/home/jack/.gnupg/random_seed", F_OK) = 0
open("/home/jack/.gnupg/pubring.gpg", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=58611382, ...}) = 0
read(3, "\231\1\242\4I+\34>\21\4\0\236wl\332\3640Sx\346\332\2523\f\30H\231\nG\245<\305"..., 4096) = 4096
close(3)                                = 0
open("/home/jack/.gnupg/pubring.kbx", O_RDONLY) = -1 ENOENT (No such file or directory)
access("/home/jack/.gnupg/pubring.gpg", F_OK) = 0
access("/home/jack/.gnupg/pubring.gpg", R_OK) = 0
stat("/home/jack/.gnupg/trustdb.gpg", {st_mode=S_IFREG|0600, st_size=13880, ...}) = 0
open("/home/jack/.gnupg/trustdb.gpg", O_RDWR) = 3
lseek(3, 0, SEEK_SET)                   = 0
read(3, "\1gpg\3\3\1\5\1\2\0\0]\fZ\202\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0"..., 40) = 40
lseek(3, 0, SEEK_SET)                   = 0
read(3, "\1gpg\3\3\1\5\1\2\0\0]\fZ\202\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0"..., 40) = 40
write(2, "gpg: using pgp", 14)          = 14
write(2, " trust model\n", 13)          = 13
lseek(3, 40, SEEK_SET)                  = 40
read(3, "\n\0\0\0\0\303\0\0\0\0\0\0\0\0\0\0\0\361\0\0\1\24\0\0\0\0\0\0\0\0\0\0"..., 40) = 40
lseek(3, 80, SEEK_SET)                  = 80
read(3, "\n\0\0\0\0\0\0\0\0\341\0\0\0\305\0\0\0\0\0\0\1R\0\0\0\"\0\0\0\256\0\0"..., 40) = 40
lseek(3, 120, SEEK_SET)                 = 120
read(3, "\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\311\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 40) = 40
lseek(3, 160, SEEK_SET)                 = 160

Those lseek and read lines go on until line 907 where this starts:

write(2, "gpg: ", 5)                    = 5
write(2, "checking the trustdb\n", 21)  = 21
access("/home/jack/.gnupg/pubring.gpg", W_OK) = 0
access("/home/jack/.gnupg/pubring.gpg", W_OK) = 0
getpid()                                = 8914
uname({sysname="Linux", nodename="machete", ...}) = 0
getpid()                                = 8914
open("/home/jack/.gnupg/.#lk0x0000555fb1120960.machete.8914", O_WRONLY|O_CREAT|O_EXCL, 0644) = 4
write(4, "      8914\n", 11)            = 11
write(4, "machete", 7)                  = 7
write(4, "\n", 1)                       = 1
close(4)                                = 0
stat("/home/jack/.gnupg/.#lk0x0000555fb1120960.machete.8914", {st_mode=S_IFREG|0644, st_size=19, ...}) = 0
link("/home/jack/.gnupg/.#lk0x0000555fb1120960.machete.8914", "/home/jack/.gnupg/.#lk0x0000555fb1120960.machete.8914x") = 0
stat("/home/jack/.gnupg/.#lk0x0000555fb1120960.machete.8914", {st_mode=S_IFREG|0644, st_size=19, ...}) = 0
unlink("/home/jack/.gnupg/.#lk0x0000555fb1120960.machete.8914x") = 0
access("/home/jack/.gnupg/pubring.gpg", W_OK) = 0
link("/home/jack/.gnupg/.#lk0x0000555fb1120960.machete.8914", "/home/jack/.gnupg/pubring.gpg.lock") = -1 EEXIST (File exists)
stat("/home/jack/.gnupg/.#lk0x0000555fb1120960.machete.8914", {st_mode=S_IFREG|0644, st_size=19, ...}) = 0
open("/home/jack/.gnupg/pubring.gpg.lock", O_RDONLY) = 4
read(4, "      2333\nmachete\n", 19)    = 19
close(4)                                = 0
getpid()                                = 8914
kill(2333, SIG_0)                       = -1 ESRCH (No such process)
write(2, "gpg: removing stale lockfile (cr"..., 45) = 45
write(2, ")\n", 2)                      = 2
unlink("/home/jack/.gnupg/pubring.gpg.lock") = 0
link("/home/jack/.gnupg/.#lk0x0000555fb1120960.machete.8914", "/home/jack/.gnupg/pubring.gpg.lock") = 0
stat("/home/jack/.gnupg/.#lk0x0000555fb1120960.machete.8914", {st_mode=S_IFREG|0644, st_size=19, ...}) = 0
open("/home/jack/.gnupg/pubring.gpg", O_RDONLY) = 4
read(4, "\231\1\242\4I+\34>\21\4\0\236wl\332\3640Sx\346\332\2523\f\30H\231\nG\245<\305"..., 8192) = 8192
umask(077)                              = 022
open("/home/jack/.gnupg/pubring.gpg.tmp", O_WRONLY|O_CREAT|O_TRUNC, 0600) = 5
umask(022)                              = 077
open("/home/jack/.gnupg/pubring.gpg", O_RDONLY) = 6
lseek(6, 0, SEEK_SET)                   = 0
read(6, "\231\1\242\4I+\34>\21\4\0\236wl\332\3640Sx\346\332\2523\f\30H\231\nG\245<\305"..., 8192) = 8192
read(6, "\21\0255\202~\257\17\377T\373t\26\317\211\203\247\22X]N\"\364!\5e\216\230\26(\310/k"..., 8192) = 8192
brk(0x555fb1159000)                     = 0x555fb1159000
read(6, "\260\2\0\3\210f\4\23\21\2\0&\2\33#\6\v\t\10\7\3\2\4\25\2\10\3\4\26\2\3\1"..., 8192) = 8192

After that it's thousands of mostly read lines until line 8206:

open("/home/jack/.gnupg/pubring.gpg", O_RDONLY) = 6
close(6)                                = 0
open("/home/jack/.gnupg/pubring.gpg", O_RDONLY) = 6
close(6)                                = 0
open("/home/jack/.gnupg/pubring.gpg", O_RDONLY) = 6
close(6)                                = 0

That continues until line 8790; from then on it looks like nothing but read lines until line 236946 when I interrupted with ctrl-C.

Running strace -o gpg.listkey.trc gpg --list-key --no-auto-check-trustdb --verbose --debug ipc,trust also hangs, yielding similar but not identical output; happy to share more of that output if it's helpful.

Oddly, both that last strace and a more simple gpg --list-key --no-auto-check-trustdb list keys then hang at the same point every time for each command; the two different command hang in the middle of two different keys, but always the same key for each command.

Finally I'll note that I can decrypt files just fine, but can't encrypt files; Enigmail just doesn't work at all, neither for decryption nor encryption, and if I leave Enigmail on (with my preferences to sign all outgoing emails) I can't send email at all in Thunderbird.

I've tried restoring backups of trustdb.gpg and pubring.gpg from when things were working fine, but nothing helps.

Happy to provide any additional info that might be helpful here!

dkg added a subscriber: dkg.Jun 27 2019, 11:03 PM

@jackalope, the place where the output is hanging is likely due to output buffering (i have been able to replicate the same problem, and the output hangs at intervals of 8192 octets). So while it is giving you a clue about where the hang is, it's not a very precise clue.

Sadly, my guess is that what you're seeing is breakage related to my own OpenPGP certificate, 0xC4BC2DDB38CCE96485EBE9C2F20691179038E5C6. It was spammed sometime in the past few weeks (i'm in the process of writing up a blog post about it), and anyone who has it in their keyring and has refreshed their keyring from any keyserver in the dominant SKS keyserver pool is likely to experience these delays.

In the meantime, i recommend that you just delete my OpenPGP certificate from your keyring, and see whether that makes things faster for you.

gpg --delete-key 0xC4BC2DDB38CCE96485EBE9C2F20691179038E5C6

Good to see you here @dkg, thanks for the response, and whoa, that's quite something!

Alas, I don't have certificate 0xC4BC2DDB38CCE96485EBE9C2F20691179038E5C6 in my keyring (though I have others of yours, including 0xCCD2ED94D21739E9 that expires on June 29, 2019.)

But could this be that there's another spammed key in my keyring? Any ideas on how I might figure out which other one it is?

dkg added a comment.Jun 28 2019, 1:19 AM

That's a great question, @jackalope. I found this in a different misbehaving keyring recently by basically deleting keys by hand until only one was left. surprise, it was mine (ugh)! But that process is pretty slow and manual and tedious. Let me see if i can do better.

Below is a gross hack that you could try doing to list the certificates that you know about based on their size -- it's possible that it will help you find the key ID of the largest certificate you have. (it is also dependent on the output of gpg --list-packets, which @werner has stated is not guaranteed to be stable, so i don't know whether it will work in your specific instance, sorry!)

You might need to let it run for a long time (maybe let it churn for 10 minutes before giving up?), since gpg --export is likely to be one of the commands that can cause a major delay:

gpg --export | gpg --list-packets | awk -F= -v oldoff=-1 -v keyid=unset '
/^# off=/{ off = $2 + 0 }
/^:public key/{ if (oldoff>-1) { print (off - oldoff) " " keyid }; oldoff = off; keyid = "unset"; }
/keyid:/ {if (keyid == "unset") { keyid = $1; } }
END { print (off - oldoff) " " keyid ; };' | sort -n

The lines that get printed will look like:

1479581 	keyid: 06EAA066E397832F
1582022 	keyid: 39278DA8109E6244
1735309 	keyid: F7F0E70F307D56ED
3368272 	keyid: 9710B89BCA57AD7C

The left-hand side is the size (in octets) taken up in the keyring by the specific certificate. the right-hand side is the 64-bit key ID of the primary key of that certificate.

If the problem you're seeing is a spammed certificate (or certificates), then are probably noticeably larger than the others.

Let me know if that works for you!

By golly, you were right, @dkg! I ran that "gross hack" (thanks so much for writing it) and after nearly 10 minutes got the output I needed. Last 5 lines:

614318 	keyid: 789D6F057FD863FE
696582 	keyid: CCD2ED94D21739E9
708855 	keyid: 33961588E1C21845
3564879 	keyid: 9FF2194CC09A61E8
46227507 	keyid: 1DCBDC01B44427C7

The second to largest amusingly belongs to of Edward the GPG Bot, but that largest certificate was the culprit. It took ~23 minutes for gpg --delete-key 0x1DCBDC01B44427C7 to finish, but after it did I can run all gpg commands and use Enigmail normally again!

Perhaps I should let Robert J. Hansen know that his certificate might have been spammed?

Thanks so much for the help here, friend!

dkg added a comment.Jun 28 2019, 8:30 AM

wow, 46MiB, that's even worse than mine. :( thanks for sharing the update, @jackalope. I'm glad you've worked around it for now, but sadly this kind of certificate flooding could happen at any time if you're using the SKS keyserver network :(

You might also want to try converting over to the keybox format (instead of keyring) because that at least will keep flooded certificates from being imported into your keyring, as documented in T4591.

If you're on debian, you should be able to do that conversion with migrate-pubring-from-classic-gpg. If you're not on debian, you can fetch the migrate-pubring-from-classic-gpg script and try it yourself.

dkg added a comment.EditedJun 28 2019, 9:02 AM

For folks who encounter this problem in the future, i recommend that you first check whether you have a pubring.gpg instead of (or in addition to) your pubring.kbx. If you do have pubring.gpg, you should be able to run the pipeline to the awk script described above with just the pubring directly, which omits the time-consuming gpg --export step above. so i think that would look like:

< ${GNUPGHOME:-~/.gnupg}/pubring.gpg gpg --list-packets | awk -F= -v oldoff=-1 -v keyid=unset '
/^# off=/{ off = $2 + 0 }
/^:public key/{
   if (oldoff>-1) { print (off - oldoff) " " keyid };
   oldoff = off; keyid = "unset";
}
/keyid:/ {if (keyid == "unset") { keyid = $1; } }
END { print (off - oldoff) " " keyid ; };' | sort -n

If you experience this problem with pubring.kbx and no pubring.gpg, please report that too!

dkg added a comment.Jun 28 2019, 9:09 AM

Please see T4592 where i've reported this particular performance concern in more detail, including profiling data.

ktalik removed a subscriber: ktalik.Jun 28 2019, 9:42 AM
georg added a subscriber: georg.Jun 28 2019, 3:35 PM
Angel added a subscriber: Angel.Jul 3 2019, 2:49 AM
werner moved this task from Backlog to Deferred on the gnupg (gpg22) board.Jul 3 2019, 6:14 PM
szpak added a subscriber: szpak.Jul 13 2019, 10:51 PM