Page MenuHome GnuPG

SM, W32: GPGSM hangs up the GnuPG System
Closed, ResolvedPublic

Description

Sometimes on Windows the GnuPG System "hangs" by GnuPG System I mean both OpenPGP and GPGSM. So it's likely to be related to the gpg-agent or shared files like the pubring.
All further operations spawn a new gpgsm or gpg process but do not finish. This can happen as soon as GpgOL / Kleopatra starts leaving the whole system blocked until processes are killed.

I've seen it in the past multiple times and T4248 is related and probably the same. The best way to reproduce it seems to be a bulk import of certificates. They do not have to be secret as T4248 describes. This needs to be fixed as it happens multiple times per day / week in larger deployments.again" is a restart of.

Highest prio as this seems to be a deployment blocker and I will work on it with the highest prio.

Details

Version
STABLE-BRANCH-2-2

Event Timeline

To reproduce this issue I started Kleopatra with an empty GNUPGHOME and imported 10 S/MIME certs at once (which spawns a gpgsm process each) with enabled logging.

It is not all actions that are blocked then but all which need to write in the pubring. So this might not be related to keylisting blocking in Kleopatra.

The processes hang while waiting for the pubring.kbx.lock as they write in the debug output.

One process is looping endlessly in gnupg_rename_file trying to rename the pubring. This fails with a sharing violation. I assume that this process is the one that currently holds the lock.

But according to procexp the file "pubring.kbx" is open in a different gpgsm process PID 12400 but that process is also writing "waiting for lock" so this looks like there is at least one place where the pubring is open without having the lock.

The last lines that the process currently holding wrote in the log:

2019-05-14 10:37:59 gpgsm[12400] DBG: chan_0x00000114 <- # Home: C:\Users\aheinecke\AppData\Roaming\gnupg
2019-05-14 10:37:59 gpgsm[12400] DBG: chan_0x00000114 <- # Config: [none]
2019-05-14 10:37:59 gpgsm[12400] DBG: chan_0x00000114 <- OK Dirmngr 2.2.15 at your service
2019-05-14 10:37:59 gpgsm[12400] DBG: connection to the dirmngr established
2019-05-14 10:37:59 gpgsm[12400] DBG: chan_0x00000114 -> GETINFO version
2019-05-14 10:37:59 gpgsm[12400] DBG: chan_0x00000114 <- D 2.2.15
2019-05-14 10:37:59 gpgsm[12400] DBG: chan_0x00000114 <- OK
2019-05-14 10:37:59 gpgsm[12400] DBG: chan_0x00000114 -> OPTION audit-events=1
2019-05-14 10:37:59 gpgsm[12400] DBG: chan_0x00000114 <- OK
2019-05-14 10:37:59 gpgsm[12400] DBG: chan_0x00000114 -> LOOKUP --cache-only /CN=GlobalSign,O=GlobalSign,OU=GlobalSign%20Root%20CA%20-%20R3
2019-05-14 10:37:59 gpgsm[12400] DBG: chan_00000114 <- [ 44 20 30 82 03 5f 30 82 02 47 a0 03 02 01 02 02 ...(895 byte(s) skipped) ]
2019-05-14 10:37:59 gpgsm[12400] DBG: chan_0x00000114 <- END

After killing that 12400 the process that was looping in gnupg_rename continues and finishes. Then I'm asked to mark a root CA as trusted by PID 2412. After confirmation that process waits on the lock while 2656 hangs in gnupg_rename. But according to procexp again 2412 holds the pubring.kbx open already.

 2019-05-14 11:23:31 gpgsm[2412] DBG: chan_0x000000f4 <- OK
2019-05-14 11:23:31 gpgsm[2412] Das Wurzelzertifikat wurde nun als vertrauenswrdig markiert
2019-05-14 11:23:31 gpgsm[2656] Warte bis auf die Datei 'C:\Users\aheinecke\AppData\Roaming\gnupg\pubring.kbx' zugegriffen werden kann ...
2019-05-14 11:23:32 gpgsm[2412] waiting for lock C:\Users\aheinecke\AppData\Roaming\gnupg\pubring.kbx.lock...

I imported 39 certificate files at once with Kleopatra with about 700 certificates and it worked. Took a long time though so It would be nice if Kleopatra would show a progess indicator or some indication that the import is running. But this is a different issue.

aheinecke reassigned this task from aheinecke to werner.
aheinecke lowered the priority of this task from Unbreak Now! to High.

When doing a "gpgsm --with-validation -k foo" (assuming you have a cert foo) gpgsm now goes into a loop and prints the certficates that match "foo" over and over again. I have not tested if it was caused by this change but I think it is likely.

Reopening this as I have seen such hangs multiple times during testing. When importing multiple keys with Kleopatra at once this can be reproduced sometimes.

I noticed this now by importing the keys for edward tester pub, the two berta boss priv keys and the gnupg.com test ca keys into Kleoptra. I did also have GpgOL open and tried to encrypt there but Kleopatra would also do keylists.

The setup had CRL checks enabled and was in VS-NfD compliant mode.

@rjh reported a problem with keyboxd from the current 2.3 beta on the ML. This is also a locking problem and _might_ be related to this bug.

Well, this is a pure Windows bug. It easily shows up when running dozens of gpgsm processes each importing a different certificate (e.g. using Kleopatra's current importer, which spawns one process per cert). The only possible fix is to close all files before starting a long running operation *and* before locking the files.

werner changed the task status from Open to Testing.Mar 2 2021, 7:33 PM
aheinecke reassigned this task from werner to ikloecker.
aheinecke lowered the priority of this task from High to Normal.

Reopening this as there still seem to be ways to run into a deadlock as was reported in RT#13361. While I still think this points to some issue in gpgsm, when Testing this I found the behavior of Kleopatra to be wrong.

So if you import 100 certificates each in a single file, kleo starts 100 importjobs. Wouldn't it be better if we would wait for an importjob to finish before starting a second job? This would also resolve some issues with pinentry dialogs appearing in a weird order. @ikloecker what do you think?

Sure, we could do this. Shouldn't make the ImportCertificatesCommand much more complex than it already is.

With 100 concurrently running gpgsm processes they all try to get the lock for the keyring. And they need to do this several times and often also for the same certificate (fetched from an external resource to complete the chain). Not good. It might be easier to bypass the gpgsm and run gpgsm directly instead of adding a feature to gpgsm to directly import from many files.

@werner Do I understand correctly that by "It might be easier to bypass the gpgsm and run gpgsm directly" you mean using gpgsm in server mode? Or what do you mean with "bypass gpgsm and run gpgsm" (which seems contradictory).

I meant bypass the gpgme engine and call gpgsm directly. Maybe using gpgme's spawn engine. But I am not sure whether this is really a good idea. If we can find a way to pass multiple filenames to gpgsm --server that would be better. But requires updates to gpgsm.

I really don't want to bypass gpgme and then parse the import results and all other status output of gpgsm ourselves. I'll go for Andre's suggestion and serialize imports of multiple files.

I really don't want to bypass gpgme and then parse the import results and all other status output of gpgsm ourselves. I'll go for Andre's suggestion and serialize imports of multiple files.

I have an Idea. Can't we read all data into memory in Kleopatra (for Certificates this should be ok) and then give this to GPGME as a single data object. So that only one process imports multiple files?

I have an Idea. Can't we read all data into memory in Kleopatra (for Certificates this should be ok) and then give this to GPGME as a single data object. So that only one process imports multiple files?

We would lose the chance to inform the user about problematic files or about which file they entered the wrong password for. Moreover, we'd lose any chance to give the user a hint which file pinentry wants the password for. (Currently, pinentry also doesn't display a reference to a filename, so it wouldn't become worse than it is already. And it's pretty bad.) Together, it would become pretty much impossible to import multiple files that are protected with different passwords.

I think processing one file after the other is the better approach even if starting 1000 gpgsm processes (one after the other) is slower than starting a single gpgsm process. But let's be realistic. How often does somebody import more than 1 file with certificates, let alone 1000 files with certificates?

@ikloecker You are right, I only thought of public key import. Then lets serialize this. Might even make for a nicer Progressbar if we count the outstanding files.

ikloecker removed a project: kleopatra.

Putting up for grabs and removing Kleopatra tag since for Kleopatra users this has been fixed (unless they manage to trigger multiple separate concurrent imports in Kleopatra).

werner claimed this task.
werner moved this task from Restricted Project Column to Restricted Project Column on the Restricted Project board.

That is an old bug report with a couple of fixes introduced over the years. As of now we sometimes see hangs on Windows on our test VMs. The common cause here seems to be USB card reader issues. Let's close this bug and wait for another bug report with current software versions.