Page MenuHome GnuPG

Kleopatra: Serialize listing of OpenPGP and S/MIME certificates
Closed, InvalidPublic

Description

Especially on Windows we seem to have a problem if multiple processes race to start gpg-agent (e.g. T7434: Kleopatra: Initial keylisting hangs for ~60 seconds (gpg-agent: Socket ...S.gpg-agent cannot be bound)).

In particular, during the initial keylisting gpg and gpgsm race to start gpg-agent if it isn't running already. This race can be avoided if we serialize the listing of OpenPGP and S/MIME certificates. The downside is that this will make the initial keylisting (actually all keylistings, but only the first one is really noticeable by the users) a bit slower.

Event Timeline

To explain why I have not changed this, even though we have observed these hangs for years. I have never been able to reproduce a hang or issue without Kleopatra and only GPGME and only through keylistings. I just looked and still had the scripts I used for testing to mimic the calling pattern of Kleopatra lying around since this code is also run each time the security approval dialog is shown in Outlook.

Basically starting:

./run-keylist --offline --validate --with-secret --openpgp &
./run-keylist --offline --validate --with-secret --cms &
wait

A thousand times to find a problem. But for me these just worked, even on windows. And since S/MIME keylistings recalculate the validity of the chain they do some calculations so taking a performance hit without any evidence that this avoids a problem seemed wrong and so I have not changed this.

My suspicion was that some other command running in parallel might be the cause for it. e.g. if you start Kleo by double clicking an encrypted file a decrpyt/verify is running while the keycache is populated or with T6323: Kleopatra: Import multiple certificate files one after the other an import was running. Or that between the last start of Kleopatra and this time S/MIME certificates were added and the "allow-mark-trusted" pinentry would be needed multiple times. Or that a trustdb check was required. But now I see that T7434 seems to indicate that only the keylisting is running and I think the trustdb check is nowadays also disabled.

In GpgOL, where the internal keycache is populated in the background I serialized the calls anyway. But I believe this has not avoided the issue that its sometimes very slow / hangs for a while or infinetely.If you have enabled "automatically encrypt if possible" sometimes it still takes a long time like T7434: Kleopatra: Initial keylisting hangs for ~60 seconds (gpg-agent: Socket ...S.gpg-agent cannot be bound) indicates before the button is enabled. Alternatively if the keycache is not populated and you have manually selected encrypt it will always bring up the key apporioval dialog. But even though I have often gotten reports about this, especially from the organisation which uses S/MIME a lot and where many users have large keyrings accumulated over many years. On my testsystems this happend super rarely. Like maybe 5 times a year.

To underline the importance of fixing the locking issue in GnuPG. I can tell you of the following interaction I observed last summer with with 3.1.26 between two power users of GpgOL with SMIME:
User R: Please send me an encrypted mail.
User Z: Ok
User Z: (A minute later). Doesnt work, hanging again.
User R: Yeah,.. you know the drill, kill the gpgsm processes.
User Z: *sigh* okay.

With luck Z then killed the hanging gpgsm process before the one GpgOL had started for encryption, so that after killing the right one the lock was released and the mail went through. I could not determine if the hanging gpgsm process was one of the processes that did the keylisting when starting outlook or one of the processes started by the keyresolver. But even if this ticket is implemented the bug will still exist, since we can never rule out that two processes access the keyring in parallel. GpgOL might still run keylistings when Kleopatra also does keylistings. And I am not even sure that the process which held the lock in this case was not for something completely different like a verification or decryption

ebo added a subscriber: ebo.

There is consensus that the issue T7434 must be resolved in the backend, where it originates.