Consider the case where we have two gpg processes. One is reading the keyring
and the other updates it by adding a new key. If the reading process has
already filled its key present cache (in keyring.c), then it will not see the
new entry if it looks it up by key id.
Description
Related Objects
- Mentioned Here
- T2135: Keyring locking on Windows broken
Event Timeline
I think that current lock/unlock mechanism is only for mutual exclusion between
multiple writers. I mean, lock/unlock is done to avoid inconsistency caused by
multiple writers.
It seems that we forget to implement mutual exclusion between writers and
readers, as Neal described.
Before 2.1.10, the write access was limited to specific interactive usage
patterns and it didn't cause major problems (it caused rarely if happened).
Now, I think that we should implement mutual exclusion between readers and writers.
To do writes, we use a copy-update-move scheme. Thus, all updates are atomic.
A read fopen()s the keyring or keybox, seeks and reads. If an update occurs
between the seek and read, the reader will see the old version: fopen is
associated with the inode, not the filename:
reader writer ------- ------- fopen("keyring.pub") seek(fp) cp("keyring.pub", "keyring.pub~") update("keyring.pub~") mv("keyring.pub~", "keyring.pub") read(fp)
Thus, writers don't interfere with readers.
We need to lock the underlying file for updates to avoid the case in which two
updates occur nearly simultaneously, but only one is saved. (Also, since the
updates occur in keyring.pub~, we need to ensure exclusive access to that file.)
writer1 writer2 ------- ------- cp("keyring.pub", "keyring.pub~") update("keyring.pub~") cp("keyring.pub", "keyring.pub~") update("keyring.pub~") mv("keyring.pub~", "keyring.pub") mv("keyring.pub~", "keyring.pub")
In the above case, writer1's update is lost. (Note: it could be worse: if both
update keyring.pub~ simultaneously, there could be corruption.)
The bug that I'm describing below only has to do with the key present cache,
which becomes inconsistent, because we don't track external writes.
Note that under Windows we don't have inodes :-(
See also T2135 .
As I remarked several times in the past the only solid way to handle these
problem is to allow access to the keyrings only via a dedicated processs.
FWIW, we do not copy to create the backup but use atomic renames:
lock(pubring.gpg.lock) read(pubring.gpg) -> process -> write(pubring.tmp) rename(pubring.gpg, pubring.gpg~) rename(pubring.tmp, pubring.gpg) unlock(pubring.gpg.lock)
Thus the worst case is that another non-updating process sees no pubring.gpg
during the time between the two renames.
As long as the cache of the reader is short-lived, I don't see a problem. The operation started before the writer, so it can use the old data to finish. Any other policy could lead to other problems (for example, a long sequence of writers could starve a reader that tries to refresh due to cache stealness). So, IMO, only if you keep long-running gpg/gpgsm processes around (maybe in --server mode?) you could have a problem.