gpg --verify has race conditions when used concurrently
Open, LowPublic

Description

There is a race condition with gnupg --verify when multiple processes
act at the same time.
strace helps cause the race condition (probably by slowing thing down)
gpgv does not seem to have the issue.

The result is you will see transient verify failures with output like below.
using '--lock-once' seems to improve the problem, but not to entirely fix it.

gpg: Signature made Fri 10 Jan 2014 05:41:43 PM UTC using DSA key ID 437D05B5
gpg: 12: read expected rec type 10, got 0
gpg: lookup_hashtable failed: trust database error
gpg: trustdb: searching trust record failed: trust database error
gpg: Error: The trustdb is corrupted.
gpg: You may try to re-create the trustdb using the commands:
gpg: cd ~/.gnupg
gpg: gpg2 --export-ownertrust > otrust.tmp
gpg: rm trustdb.gpg
gpg: gpg2 --import-ownertrust < otrust.tmp
gpg: If that does not work, please consult the manual

see the attached 'show-race.sh' for a reproduce example.

smoser added a subscriber: smoser.

werner added a subscriber: werner.Jul 30 2014, 2:40 PM

GnuPG version you used for the test?

Sorry. Version used was reported in the original report at:
https://bugs.launchpad.net/ubuntu/+source/gnupg/+bug/1342807
but i didn't copy that here. I ran this on 14.10 Ubuntu with apt provided gpg.
version is: 1.4.16-1.2ubuntu1

gniibe added a subscriber: gniibe.May 13 2015, 4:20 AM

Please let us know your filesystem for .gnupg/ directory.
Unfortunately, I couldn't reproduce the bug with your script on my machine.
If possible, could we share the output of the script in your system?

Attaching output of runs.
show-race-output-nostrace.tar.gz
and
show-race-output.tar.gz

both run on fresh 15.04 instance (gpg (GnuPG) 1.4.18).
nostrace version did not use strace. I had to run that in a loop in order to
catch failure. with strace, it fails immediately.

strace version (failed first time):

./show-race.sh

no-strace:
while :; do ../show-race-output/show-race.sh || break; rm -Rf out*; done

there is more info in 'gpg-info.txt' inside the tarballs, but for completeness
gpg --version: gpg (GnuPG) 1.4.18
lsb_release -sc: vivid
uname -a: Linux vivid-20150513-125547 3.19.0-16-generic #16-Ubuntu SMP Thu Apr
30 16:09:58 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
filesystem is ext4.

gniibe claimed this task.May 13 2015, 4:13 PM

Thank you for your time and cooperation.
Your data helps me a lot.
I think that same bug is in 2.0 and 2.1, too.
I'll try to make minimal reproducible scenario to locate the bug.

On 05/13/2015 11:13 PM, NIIBE Yutaka via BTS wrote:

I think that same bug is in 2.0 and 2.1, too.
I'll try to make minimal reproducible scenario to locate the bug.

I think that there are multiple race conditions for accsess to trust
db.

For one of them in GnuPG 2.1, I manage to make a minimal repreducible
scenario.

(0) Preparation: Remove .gnupg/trudtdb.gpg.
(1) Run a gpg2 under GDB with a breakpoint at create_version_record.
(2) Invoke "run" command with --verify under gdb.
(3) It stops at the breakpoint after making an empty file of trustdb.gpg.
(4) In another terminal, invoke another gpg --verify. It fails with:

gpg: Fatal: /home/gniibe/.gnupg/trustdb.gpg: invalid trustdb

--

There is some weirdness in the code. For example I can't remember why the order
for creating a log is different under riscos. Given that we have not tested
riscos support for ages it might be better to remove it entirely to remove a
source of error.

Creating the trusdb file should only be allowed while holding a lock on it.
That is one of the reasons we use a separate lock file.

Here is a possible fix.
I write this for current master branch and tested.
Then, it is ported to 1.4. It builds and it seems working well.
Please test it out.
I was wrong in T1675 (gniibe on May 15 2015, 06:38 AM / Roundup) saying multiple races.
Provided write(2) is atomic, the race is only here for creating trustdb.gpg and
checking if it's there.

Fixed in master which was released as 2.1.5.
Fixed in the repo of 1.4 and 2.0.

gniibe closed this task as Resolved.

I'm still able to make this fail, though quite less often.
Example is here.

$ wget https://bugs.gnupg.org/gnupg/file443/show-race.sh -O show-race.sh
$ chmod 755 show-race.sh
$ dpkg-query --show gnupg
$ gnupg --version
gpg (GnuPG) 1.4.20
Copyright (C) 2015 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Home: ~/.gnupg
Supported algorithms:
Pubkey: RSA, RSA-E, RSA-S, ELG-E, DSA
Cipher: IDEA, 3DES, CAST5, BLOWFISH, AES, AES192, AES256, TWOFISH,

CAMELLIA128, CAMELLIA192, CAMELLIA256

Hash: MD5, SHA1, RIPEMD160, SHA256, SHA384, SHA512, SHA224
Compression: Uncompressed, ZIP, ZLIB, BZIP2

$ sed -i.dist -e 's,precise-updates,precise,' -e
's,20101020ubuntu136.15,current,' show-race.sh
$ diff -u show-race.sh.dist show-race.sh

  • show-race.sh.dist 2016-06-06 16:37:25.845783450 -0400

+++ show-race.sh 2016-06-06 16:37:26.645771713 -0400
@@ -37,7 +37,7 @@

mkdir "$GNUPGHOME" && chmod 700 "$GNUPGHOME"

fi

-url="http://archive.ubuntu.com/ubuntu/dists/precise-updates/main/installer-amd64/20101020ubuntu136.15/images"
+url="http://archive.ubuntu.com/ubuntu/dists/precise/main/installer-amd64/current/images"
kr=/usr/share/keyrings/ubuntu-archive-keyring.gpg
for f in SHA256SUMS SHA256SUMS.gpg; do

   [ -f "$f" ] && continue

$ i=0; while i=$(($i+1)); do rm -Rf out*; echo -n "$i "; ./show-race.sh ||
break; done
1 max=100 cmd=gpg --verify args:
2 max=100 cmd=gpg --verify args:
3 max=100 cmd=gpg --verify args:
4 max=100 cmd=gpg --verify args:
...
67 max=100 cmd=gpg --verify args:
68 max=100 cmd=gpg --verify args:
69 max=100 cmd=gpg --verify args:
3 failed: out.3 [2]

$ cat out.3
gpg: Signature made Mon 23 Apr 2012 03:52:09 PM EDT using DSA key ID 437D05B5
gpg: error opening lockfile `/tmp/xt/out.gnupghome/trustdb.gpg.lock': No such
file or directory
gpg: lockfile disappeared
gpg: 12: read expected rec type 10, got 0
gpg: lookup_hashtable failed: trust database error
gpg: trustdb: searching trust record failed: trust database error
gpg: Error: The trustdb is corrupted.
gpg: You may try to re-create the trustdb using the commands:
gpg: cd /tmp/xt/out.gnupghome
gpg: gpg2 --export-ownertrust > otrust.tmp
gpg: rm trustdb.gpg
gpg: gpg2 --import-ownertrust < otrust.tmp
gpg: If that does not work, please consult the manual

smoser reopened this task as Open.Jun 6 2016, 10:52 PM
gniibe added a comment.Jun 9 2016, 4:21 AM

Thank you for update.
msg8431 seems to be another race condition. I only fixed one race in 2015.

My saying in T1675 (gniibe on May 25 2015, 07:38 AM / Roundup) sounds wrong (now, for me).
For example, create_hashtable does lseek to SEEK_END.
When some another process is adding new entry (say, also calling
create_hashtable), we have a valid race condition here.
I mean,

(1) process A calls lseek with SEEK_END, seek goes to a point.
    Then, context switch.
(2) process B calls lseek with SEEK_END. seek goes same point as A.
(3) process B update info using the point.  context switch to A.
(4) process A wrongly overrides info using the point.
    It results inconsistent data.

I think that this patch improve the situation.
It moves the creation of the hash table to the place where it creates version
record (holding the lock).

Finally, I managed to reproduce the same (I suppose) situation.
Please see: https://lists.gnupg.org/pipermail/gnupg-devel/2016-June/031211.html
It is READ vs. WRITE race condition.

For a particular hash table race condition, it is
fixed in master which will be released as 2.1.13.
Fixed in the repo of 1.4 and 2.0.

I think that for this particular use case of gnupg with external keyring, the
expected usage doesn't need to use trustdb at all. In such a case, we can use
--trust-model always (like gpgv), or we can use gpgv.
Then, original problem is gone, since it doesn't touch trustdb.
Anyway, fixing a race condition is good thing.
Note that there are more race conditions left, but those can be only triggered
by multiple processes accessing trustdb and a process is writing to trustdb.

werner lowered the priority of this task from Normal to Low.Sep 21 2017, 3:34 PM