Page MenuHome GnuPG

Kleopatra: loading X.509 certificate from Signature card hangs
Testing, HighPublic

Description

Setup was inserting a yubikey and a "signature card v2.0" and then the loading bar continued for more than an hour (over lunch):

There where no gpg or gpgsm processes running when I noticed the situation.

It stopped after executing:

gpgconf --kill scdaemon


This seems to be not easily reproducible, but maybe you can see something from the log.

After removing the signature card and inserting it again, everything worked as expected and the certificates where loaded (but meanwhile I had activated debugging for gpgsm, if this is relevant):

Details

Version
Gpg4win-Beta-70

Event Timeline

It looks as if a keylisting interfered with a gpg-agent/scdaemon command.

48458.670390	2024/11/06 12:43:04.238	5772	kleopatra.exe	org.kde.pim.kleopatra: ReaderStatusThread[GUI]::ping()

^ update of the smart cards is requested

48458.670695	2024/11/06 12:43:04.238	5772	kleopatra.exe	org.kde.pim.kleopatra: ReaderStatusThread[2nd]: new iteration command= "__update__"  ; nullSlot= true

^ background thread starts update of the smart cards

48459.147743	2024/11/06 12:43:04.728	5772	kleopatra.exe	org.kde.pim.kleopatra: ReaderStatusThread[GUI]::ping()

^ another update of the smart cards is requested (the request is queued)

48464.804883	2024/11/06 12:43:10.393	5772	kleopatra.exe	org.kde.pim.kleopatra: ReaderStatusThread: Card "89490171500022806460" with app "nks" was added
48464.805095	2024/11/06 12:43:10.393	5772	kleopatra.exe	org.kde.pim.kleopatra: ReaderStatusThread: Card "D2760001240100000006154932910000" with app "openpgp" was added
48464.807483	2024/11/06 12:43:10.393	5772	kleopatra.exe	org.kde.pim.kleopatra: ReaderStatusThread: Card "D2760001240100000006154932910000" with app "piv" was added

^ the background thread completed the update of the smart cards and found three card apps

48464.811286	2024/11/06 12:43:10.393	5772	kleopatra.exe	org.kde.pim.kleopatra: ReaderStatusThread[2nd]: new iteration command= "__update__"  ; nullSlot= true

^ background thread starts another update of the smart cards

48464.924701	2024/11/06 12:43:10.492	5772	kleopatra.exe	org.kde.pim.kleopatra: ReaderStatusThread[GUI]::learnCardsCMS()

^ learn cards is requested (and queued) -> Kleopatra shows the progress overlay

48465.796319	2024/11/06 12:43:11.291	5772	kleopatra.exe	org.kde.pim.libkleo: KeyCache::RefreshKeysJob start

^ a keylisting is started (OpenPGP and S/MIME)

48467.549251	2024/11/06 12:43:12.874	5772	kleopatra.exe	org.kde.pim.libkleo: sendStatusLinesCommand "SCD LEARN --force" : got ( status( "READER" ) = "SCM Microsystems Inc. SPRx32 USB Smart Card Reader 0"
[...]
48467.550423	2024/11/06 12:43:12.875	5772	kleopatra.exe	org.kde.pim.libkleo: sendCommand "READKEY --card --no-data -- NKS-NKS3.4531"
48467.895485	2024/11/06 12:43:13.187	5772	kleopatra.exe	org.kde.pim.libkleo: sendStatusLinesCommand "READKEY --card --no-data -- NKS-NKS3.4531" : got ( )
48467.896400	2024/11/06 12:43:13.188	5772	kleopatra.exe	org.kde.pim.libkleo: sendCommand "READKEY --card --no-data -- NKS-NKS3.45B1"
48468.209551	2024/11/06 12:43:13.471	5772	kleopatra.exe	org.kde.pim.libkleo: sendStatusLinesCommand "READKEY --card --no-data -- NKS-NKS3.45B1" : got ( )
48468.209660	2024/11/06 12:43:13.471	5772	kleopatra.exe	org.kde.pim.libkleo: sendCommand "READKEY --card --no-data -- NKS-NKS3.4571"

^ the background thread sends multiple commands to gpg-agent to gather information about the smart cards
^ the last READKEY command seems to hang

48468.598283	2024/11/06 12:43:13.822	5772	kleopatra.exe	org.kde.pim.libkleo: Kleo::KeyCache::RefreshKeysJob(0x63ccba8) RefreshKeysJob::done

^ the keylisting is done

Lunch time! After lunch the hanging certificate loading is discovered. After trying a few other things gpgconf --kill scdaemon was run.

54198.829064	2024/11/06 14:14:50.554	5772	kleopatra.exe	org.kde.pim.libkleo: sendCommand "READKEY --card --no-data -- NKS-NKS3.4571" failed: "Dateiende" (code: 16383, source: GPG Agent)

^ the hanging READKEY command fails; gpg-agent/gpgme reports "Dateiende" (== EOF)

54198.829115	2024/11/06 14:14:50.554	5772	kleopatra.exe	org.kde.pim.libkleo: sendStatusLinesCommand "READKEY --card --no-data -- NKS-NKS3.4571" : t == NULL
54198.829218	2024/11/06 14:14:50.556	5772	kleopatra.exe	org.kde.pim.kleopatra: Running "READKEY --card --no-data -- NKS-NKS3.4571" failed: "Dateiende" (code: 16383, source: GPG Agent)
54198.829329	2024/11/06 14:14:50.556	5772	kleopatra.exe	org.kde.pim.libkleo: sendCommand "READKEY --card --no-data -- NKS-ESIGN.4531"
54198.829559	2024/11/06 14:14:50.556	5772	kleopatra.exe	org.kde.pim.libkleo: sendCommand "READKEY --card --no-data -- NKS-ESIGN.4531" failed: "Broken pipe" (code: 32877, source: GPG Agent)
[...]
54198.830595	2024/11/06 14:14:50.557	5772	kleopatra.exe	org.kde.pim.kleopatra: ReaderStatusThread: Card "" with app "" was added
54198.830656	2024/11/06 14:14:50.557	5772	kleopatra.exe	org.kde.pim.kleopatra: ReaderStatusThread: Card "" with app "" was added
54198.830753	2024/11/06 14:14:50.557	5772	kleopatra.exe	org.kde.pim.kleopatra: ReaderStatusThread: Card "D2760001240100000006154932910000" with app "openpgp" was removed
54198.831885	2024/11/06 14:14:50.559	5772	kleopatra.exe	org.kde.pim.kleopatra: ReaderStatusThread: Card "D2760001240100000006154932910000" with app "piv" was removed

^ the background thread sends further commands to gpg-agent which fail with EOF and Broken pipe
^ then the background thread reports bogus results (Card "" with app "" was added) and remove yubikey apps

54198.832384	2024/11/06 14:14:50.560	5772	kleopatra.exe	org.kde.pim.kleopatra: ReaderStatusThread[2nd]: new iteration command= "__learn__"  ; nullSlot= true
54198.832998	2024/11/06 14:14:50.560	5772	kleopatra.exe	org.kde.pim.kleopatra: onUpdateFinished
54198.834801	2024/11/06 14:14:50.564	5772	kleopatra.exe	org.kde.pim.kleopatra: learnCMSCards Running "C:\\Program Files (x86)\\GnuPG\\bin\\gpgsm.exe" "--learn-card -v"

^ now the queued "learn cards" is started and completes successfully

@gniibe It seems that a keylisting (with gpg and gpgsm) interferes with a READKEY --card --no-data -- NKS-NKS3.4571 gpg-agent command and makes it hang until scdaemon is killed.

@ikloecker Thank you sharing the problem. I don't know much aboug NKS card.

Reading the code, it supports multiple "applications" on the card, and it switches on demand. Confusingly, it's not the APP of scdaemon (like openpgp and piv), but the internal things behind the implementation of app-nks.c.

IIUC, the problem might be a serialization issue (or side effect) between SCD LEARN and keylisting (using SCD KEYINFO) for app-nks.c. I guess that something fatal occurs by SCD LEARN + SCD KEYINFO and following SCD READKEY results crazy.

In app-nks.c, it has switch_application and it is used by iterate_over_filelist function, which is used by do_with_keygrip. switch_application is also used by do_learn_status function.
I think that when SCD LEARN and SCD KEYINFO are processed at the same time, switch_application might be done unexpectedly.

Support for NKS (Telesec NetKey card and Signature card v2, both based on TCOS) is very old and predates the support for multiple applications. They are also not as well separated as with Yubikey applications. Thus the auto switching between the NKS app and the SigG app.

ebo added a project: Restricted Project.Thu, Nov 7, 8:44 AM

This shell script running gpg-connect-agent should run successfully:

This one probably fails:

I wonder if these are reproducible.

@ebo @ikloecker Let me explain my thoughts. If you have time, please help me doing some tests in your environment.

There are two threads in the activities of Kleo at startup.
(1) getting information form cards
(2) listing keys by gpg and gpgsm

I would like to identify the race condition in question for these two threads, so that I can fix it.

For (1), the minimum version can be emulated by the shell script of test00000.sh. For this explanation I put the content here:

gpg-connect-agent <<EOF
SCD SERIALNO --all
SCD SWITCHCARD 89490171500022806460
SCD LEARN --force
READKEY --card --no-data -- NKS-NKS3.4531
READKEY --card --no-data -- NKS-NKS3.45B1
READKEY --card --no-data -- NKS-NKS3.4571
READKEY --card --no-data -- NKS-ESIGN.4531
EOF

89490171500022806460 is the serial number of NKS card. If it's different, please modify this serialno string.
The script asks gpg-agent to connect all readers/tokens by SCD SERIALNO --all. Then select the NKS card, get the card information by LEARN command, then check each key is available.

This script should work with no problem.

For (2), gpg/gpgsm asks gpg-agent listing private keys (including keys on cards). minimum version is SCD KEYINFO --list.
So, in the shell script of test11111.sh, I put the lines of SCD KEYINFO --list between lines of the (1) script.
That is:

gpg-connect-agent <<EOF
SCD SERIALNO --all
SCD SWITCHCARD 89490171500022806460
SCD KEYINFO --list
SCD LEARN --force
SCD KEYINFO --list
READKEY --card --no-data -- NKS-NKS3.4531
SCD KEYINFO --list
READKEY --card --no-data -- NKS-NKS3.45B1
SCD KEYINFO --list
READKEY --card --no-data -- NKS-NKS3.4571
SCD KEYINFO --list
READKEY --card --no-data -- NKS-ESIGN.4531
EOF

If this test11111.sh can replicate the hanging up, we will be able to catch the bug soonish. If not, it's a heisenbug (out of my hypothesis so far).

For the record, I add the info here too (was: just in xmpp).

The test11111.sh did not cause the hanging up.
This means that it's not the race condition (which I imagined).

So, I chased the bug into other areas. That's nPth.

gniibe changed the task status from Open to Testing.EditedThu, Nov 14, 3:33 AM

This symptom can be explained by the nPth bug of T7386.

Suppose that there are two or more requests to scdaemon, and scdaemon internally runs scd_update_reader_status_file.
scd_update_reader_status_file gets the lock (W-lock) for all cards, and others are waiting R-lock to access one of cards.
Finishing the things in scd_update_reader_status_file, lock is released, but only one waiting thread wakes up (and others keep waiting forever).
New requests are handled normally, so, gpgconf --kill scdaemon correctly kills the scdaemon.

Possibly, it was:
(1) internal W-lock
(2) request by Kleo for SCD READKEY
(3) request by gpg/gpgsm for SCD KEYINFO --list
(4) W-lock released, but only (3) wakes up, not (2)
(5) gpgconf --kill scdaemon works