Page MenuHome GnuPG

Kleopatra: broken in Testversion beta-41
Closed, ResolvedPublic

Description

The behavior described here is from current Windows testversion Beta-41, but I've seen the following message in Debugview repeatedly in Beta-35, too, while testing:

kleopatra.exe   <process started at some_timestring has terminated with 0xc0000005 (EXCEPTION_ACCESS_VIOLATION)>

It is the last message in Debugview when quitting Kleopatra.

The issue I describe in the following might be related or not.

How to reproduce:
Start gpg4win-beta-41, typically the running processes after completed start will look like this:


with one of gpgsm.exe and gpg.exe each.
It might happen that only gpgsm.exe is running, like here:

In this case the OpenPGP certificates are not listed. But they are of course there and gpg -k on the cli will show them.

Both is not the usual behavior. gpg.exe and gpgsm.exe might both be seen shortly during loading of the certificate cache, after that they should be terminated.

If I now initialize actions which require gpg or gpgsm, each time another of these processes is started but can not proceed. So they accumulate. e.g.:


where I repeatedly tried to delete a certificate.
Then Kleopatra complains when trying to quit that background processes are running.
The processes can usually not be killed by "gpgcong --kill all" and have to be killed in the process explorer. Killing gpg-agent usually kills all.

For the attached gpgme log I have done the following:
Started Kleopatra (after making sure that no other gpg processes are running).
(Try to) delete a certificate by Right-click-> delete.
Quit Kleopatra.


Debugview:

333	3409.429136	1484	kleopatra.exe	org.kde.pim.kleopatra: Startup timing: 24 ms: Application created
334	3409.430095	1484	kleopatra.exe	org.kde.pim.kleopatra: Responder handle: 0x0
335	3409.431535	1484	kleopatra.exe	org.kde.pim.kleopatra: Created responder:  "kleopatraResponder"  with handle:  0x70586
336	3409.431569	1484	kleopatra.exe	org.kde.pim.kleopatra: Startup timing: 27 ms: Service created
337	3409.431600	1484	kleopatra.exe	org.kde.pim.kleopatra: Application created
338	3409.434820	1484	kleopatra.exe	org.kde.pim.kleopatra: Startup timing: 30 ms: GPGME Initialized
339	3409.436595	1484	kleopatra.exe	org.kde.pim.kleopatra: Startup timing: 32 ms: Starting version info check
340	3409.436696	1484	kleopatra.exe	org.kde.pim.libkleo: Verifying "c:/Program Files (x86)/Gpg4win/bin/../VERSION"
341	3409.436839	1484	kleopatra.exe	org.kde.pim.libkleo: No signature found at "C:/Program Files (x86)/Gpg4win/VERSION.sig"
342	3409.436873	1484	kleopatra.exe	org.kde.pim.kleopatra: Startup timing: 32 ms: Version info checked
343	3409.437007	1484	kleopatra.exe	org.kde.pim.kleopatra: Got ElevationToken  0
344	3409.437181	1484	kleopatra.exe	org.kde.pim.kleopatra: Startup timing: 32 ms: Checking backend versions
345	3409.741389	1484	kleopatra.exe	org.kde.pim.libkleo: Parsed 2.4.6-beta58 as:  2 . 4 . 6 .
346	3409.742433	1484	kleopatra.exe	org.kde.pim.libkleo: Running gpgconf --show-versions ...
347	3409.815320	1484	kleopatra.exe	org.kde.pim.libkleo: gpgconf stdout: "* GnuPG 2.4.6-beta58 (5d3995b16)\nMingW32\nWindows 10.0 build 19045\n\n* Libgcrypt 1.11.0 (9d94d784)\nversion:1.11.0:10b00:1.50:13200:\ncc:100000:gcc:10-win32 20210110:\nciphers:arcfour:blowfish:cast5:des:aes:twofish:serpent:rfc2268:seed:camellia:idea:salsa20:gost28147:chacha20:sm4:aria:\npubkeys:dsa:elgamal:rsa:ecc:\ndigests:crc:gostr3411-94::md4:md5:rmd160:sha1:sha256:sha512:sha3:tiger:whirlpool:stribog:blake2:sm3:\nrnd-mod:w32:\ncpu-arch:x86:i386:\nmpi-asm:i386/mpih-add1.S:i386/mpih-sub1.S:i386/mpih-mul1.S:i386/mpih-mul2.S:i386/mpih-mul3.S:i386/mpih-lshift.S:i386/mpih-rshift.S:\nhwflist:intel-cpu:intel-bmi2:intel-ssse3:intel-sse4.1:intel-pclmul:intel-aesni:intel-rdrand:intel-avx:intel-avx2:intel-rdtsc:\nfips-mode:n:::\nrng-type:standard:1:3030000:1:\ncompliance:::\n\n* GpgRT 1.50 (bb73261)\n\n* Libassuan 3.0.1 (c9e9027)\n\n* KSBA 1.6.7 (b14e68b)\n\n* NTBTLS 0.3.2 (2c38007)\n\n"
348	3409.815399	1484	kleopatra.exe	org.kde.pim.kleopatra: Startup timing: 410 ms: backend versions checked
349	3411.751910	1484	kleopatra.exe	org.kde.pim.libkleo: agentIsRunning : Connecting to the agent failed.
350	3411.754019	1484	kleopatra.exe	org.kde.pim.libkleo: Starting gpgconf (QProcess(0x3e8fe48)) with arguments "--launch gpg-agent" ...
351	3411.758854	1484	kleopatra.exe	org.kde.pim.libkleo: gpgconf (QProcess(0x3e8fe48)) was started successfully
352	3411.760086	1484	kleopatra.exe	org.kde.pim.kleopatra: ReaderStatus::Private: Using deprecated FileSystemWatcher
353	3411.760172	1484	kleopatra.exe	org.kde.pim.libkleo: adding
354	3411.760172	1484	kleopatra.exe	  "C:\\Users\\g10code.WIN-TEST3\\AppData\\Roaming\\gnupg\n C:/Users/g10code.WIN-TEST3/AppData/Roaming/gnupg/reader_0.status" 
355	3411.760172	1484	kleopatra.exe	/end
356	3411.776369	1484	kleopatra.exe	org.kde.pim.libkleo: adding
357	3411.776369	1484	kleopatra.exe	  "C:/Users/g10code.WIN-TEST3/AppData/Roaming/gnupg\n C:/Users/g10code.WIN-TEST3/AppData/Roaming/gnupg/private-keys-v1.d\n C:/Users/g10code.WIN-TEST3/AppData/Roaming/gnupg/reader_0.status\n C:/Users/g10code.WIN-TEST3/AppData/Roaming/gnupg/trustdb.gpg\n C:/Users/g10code.WIN-TEST3/AppData/Roaming/gnupg/trustlist.txt\n C:/Users/g10code.WIN-TEST3/AppData/Roaming/gnupg/private-keys-v1.d/22F88F077242CD267D5945172B68E1A15C82D76E.key\n C:/Users/g10code.WIN-TEST3/AppData/Roaming/gnupg/private-keys-v1.d/4FE905D9AF633D134BB74109526CC762EC9BC8DC.key\n C:/Users/g10code.WIN-TEST3/AppData/Roaming/gnupg/private-keys-v1.d/811564E41DF97BB799FF058E407CC88569BB7EEC.key\n C:/Users/g10code.WIN-TEST3/AppData/Roaming/gnupg/private-keys-v1.d/9807C934B87DCAB103D5E8A4D69C3CFB600F75DA.key\n C:/Users/g10code.WIN-TEST3/AppData/Roaming/gnupg/private-keys-v1.d/F4094C3768BF8C5874F015D6AD439E54DE44D54F.key\n C:/Users/g10code.WIN-TEST3/AppData/Roaming/gnupg/private-keys-v1.d/FFEFC219B3698D26B46A376533EB91ADD4609C8C.key" 
358	3411.776369	1484	kleopatra.exe	/end
359	3411.799446	1484	kleopatra.exe	org.kde.pim.kleopatra: Startup timing: 2393 ms: Application initialized
360	3411.799474	1484	kleopatra.exe	org.kde.pim.kleopatra: Startup timing: 2394 ms: UiServer created
361	3411.812510	1484	kleopatra.exe	org.kde.pim.kleopatra: Startup timing: 2407 ms: UiServer started
362	3411.812676	1484	kleopatra.exe	org.kde.pim.kleopatra: Kleo::Command(0xaefe94) ~Command
363	3411.812743	1484	kleopatra.exe	org.kde.pim.kleopatra: Kleo::Command(0xaefe94) ~Private
364	3411.812762	1484	kleopatra.exe	org.kde.pim.kleopatra: Startup timing: 2408 ms: SelfCheck completed
365	3411.812833	1484	kleopatra.exe	org.kde.pim.kleopatra: startMonitoring
366	3411.812888	1484	kleopatra.exe	org.kde.pim.kleopatra: startMonitoring waiting for key cache ...
367	3411.812906	1484	kleopatra.exe	org.kde.pim.kleopatra: openOrRaiseMainWindow
368	3412.496427	1484	kleopatra.exe	org.kde.pim.libkleo: KeyFilterManager:: reload final filter count is 17
369	3412.528381	1484	kleopatra.exe	kf.config.core: Use of KConfigWatcher without DBus support. You will not receive updates
370	3412.537955	1484	kleopatra.exe	org.kde.pim.kleopatra: Hook into the help menu to show the About dialog ourselves
371	3412.555888	1484	kleopatra.exe	org.kde.pim.kleopatra: open_or_raise showing window
372	3412.611467	1484	kleopatra.exe	org.kde.pim.kleopatra: you need to register view Kleo::TreeView(0x5d542a0, name="m_view") before trying to set it as the current view!
373	3412.695323	1484	kleopatra.exe	org.kde.pim.kleopatra: No update for: "4.3.2-beta41"
374	3412.695516	1484	kleopatra.exe	org.kde.pim.kleopatra: Startup timing: 3290 ms: new instance created
375	3412.695657	1484	kleopatra.exe	org.kde.pim.libkleo: Kleo::KeyCache(0x55e1990) reload option: 0
376	3412.696028	1484	kleopatra.exe	org.kde.pim.libkleo: KeyCache::RefreshKeysJob start
377	3412.697424	1484	kleopatra.exe	org.kde.pim.libkleo: Kleo::KeyCache(0x55e1990) reload option: 0
378	3412.697452	1484	kleopatra.exe	org.kde.pim.libkleo: Kleo::KeyCache(0x55e1990) reload - refresh already running
379	3412.697492	1484	kleopatra.exe	org.kde.pim.libkleo: Waiting for keycache.
380	3420.076936	1484	kleopatra.exe	org.kde.pim.libkleo: gpgconf (QProcess(0x3e8fe48)) exited (exit code: 0)
381	3420.916028	1484	kleopatra.exe	org.kde.pim.libkleo: Kleo::KeyCache::RefreshKeysJob(0x57a3b40) RefreshKeysJob::done
382	3420.917885	1484	kleopatra.exe	org.kde.pim.libkleo: readGroups Reading groups
383	3420.919229	1484	kleopatra.exe	org.kde.pim.libkleo: findByFingerprint Ignoring unknown key with fingerprint: 5D73251BA326024602696FD87CA8B3C25843D5D3
384	3420.919337	1484	kleopatra.exe	org.kde.pim.libkleo: findByFingerprint Ignoring unknown key with fingerprint: FADC4675146CFAF3D86F137E1D3C5E6E3DB3C71D
385	3420.919969	1484	kleopatra.exe	org.kde.pim.kleopatra: startMonitoring
386	3420.920000	1484	kleopatra.exe	org.kde.pim.kleopatra: startMonitoring key cache is ready
387	3420.920359	1484	kleopatra.exe	org.kde.pim.kleopatra: ReaderStatusThread[2nd]: new iteration command= "__update__"  ; nullSlot= true
388	3420.920416	1484	kleopatra.exe	org.kde.pim.kleopatra: update_cardinfo()
389	3420.920507	1484	kleopatra.exe	org.kde.pim.libkleo: sendCommand "SCD SERIALNO --all"
390	3420.922580	1484	kleopatra.exe	org.kde.pim.libkleo: Keycache available.
391	3420.922663	1484	kleopatra.exe	org.kde.pim.libkleo: Reloading keycache with remarks enabled
392	3420.922685	1484	kleopatra.exe	org.kde.pim.libkleo: Kleo::KeyCache(0x55e1990) reload option: 0
393	3420.923140	1484	kleopatra.exe	org.kde.pim.libkleo: KeyCache::RefreshKeysJob start
394	3420.923456	1484	kleopatra.exe	org.kde.pim.kleopatra: Kleo::Command(0x5693ae8) ~Command
395	3420.923574	1484	kleopatra.exe	org.kde.pim.kleopatra: Kleo::Command(0x5693ae8) ~Private
396	3422.040947	1484	kleopatra.exe	org.kde.pim.libkleo: "C:/Users/g10code.WIN-TEST3/AppData/Roaming/gnupg/reader_0.status"
397	3422.041841	1484	kleopatra.exe	org.kde.pim.libkleo: "C:/Users/g10code.WIN-TEST3/AppData/Roaming/gnupg/reader_0.status"
398	3422.138706	1484	kleopatra.exe	org.kde.pim.kleopatra: ReaderStatusThread[GUI]::ping()
399	3453.820791	1484	kleopatra.exe	org.kde.pim.kleopatra: 0x5f67ec8
400	3467.587374	1484	kleopatra.exe	org.kde.pim.kleopatra: 
401	3470.477533	1484	kleopatra.exe	org.kde.pim.kleopatra: Kleo::DeleteCertificatesCommand
402	3470.582221	1484	kleopatra.exe	org.kde.pim.kleopatra: controller still has commands running, this may crash now...
403	3470.611300	1484	kleopatra.exe	org.kde.pim.kleopatra: Kleo::Command(0x5f67ec8) ~Command
404	3470.613321	1484	kleopatra.exe	org.kde.pim.kleopatra: Kleo::Command(0x5f67ec8) ~Private
405	3470.622428	1484	kleopatra.exe	org.kde.pim.libkleo: Kleo::KeyCache::RefreshKeysJob(0x5cef750) RefreshKeysJob::canceled
406	3470.734865	1484	kleopatra.exe	Please call QDesktopServices::unsetUrlHandler() before destroying a registered URL handler object.
407	3470.734865	1484	kleopatra.exe	Support for destroying a registered URL handler object is deprecated, and will be removed in Qt 6.6.
408	3471.961283	1484	kleopatra.exe	<process started at 16:10:54.181 has terminated with 0xc0000005 (EXCEPTION_ACCESS_VIOLATION)>

Details

Version
Version 3.2.2.2405000+git~ (Gpg4win-4.3.2-beta41)

Event Timeline

werner added a project: Bug Report.
werner added a subscriber: werner.

Fixing this is important for getting the next release out.

I think the crash in the end is the same we have in T6688: Kleopatra GPGME: Reported assert on exit where quitting kleopatra with running jobs tries to cancel all open contexts and then crashes where the assert would be triggered in debug builds. In T6688 we just hid this issue again by not keeping the deviceinfowatcher running.

So here it seems that there are hanging jobs, which then cause some lockup of the GnuPG system and that leaves more and more jobs blocked. And when closing Kleopatra we run into that problem.

/* Enter the context CTX into the done list with status STATUS.  */
static void
ctx_done (gpgme_ctx_t ctx, gpgme_error_t status, gpgme_error_t op_err)
{
  struct ctx_list_item *li;

  LOCK (ctx_list_lock);
  li = ctx_active_list;
  while (li && li->ctx != ctx)
    li = li->next;
  assert (li);
  ... code does some more and dereferences li.

As I understood this code I thought the assumption that li is not null is just wrong here because the while loop terminates when li is null and so it is "normal" that the assert would be triggered. I would have expected this code to be the intention of the author:

while (li->next && li->ctx != ctx)
  li = li->next;
assert (li);

But I think Ingos analysis was that there was a deeper problem and I am not confident in that part of the code.

Regarding the underlying "gpgsm hangs" we still have to find the real cause. I have seen issues like this where the system locks up for years, and we found most of the problems but the last time I was able to reproduce it was by importing multiple certificates in parralel. And there we did not find and fix the underlying problem in GnuPG but just the code in Kleopatra to not use GnuPG in this way. T4505: SM, W32: GPGSM hangs up the GnuPG System

Btw. Here is a nice backtrace, which I think is similar to the crash part of this Task:

-- Backtrace (Reduced):
#9  0x00007f6829aaeb3e in ctx_done (op_err=<optimized out>, status=<optimized out>, ctx=0x7f67f8004df0) at /usr/src/debug/gpgme-1.15.1-6.fc36.x86_64/src/wait-global.c:125
#10 _gpgme_wait_global_event_cb (data=0x7f67f8004df0, type=<optimized out>, type_data=<optimized out>) at /usr/src/debug/gpgme-1.15.1-6.fc36.x86_64/src/wait-global.c:220
#11 0x00007f6829adf9e3 in _gpgme_engine_io_event (type_data=0x7f6807ffe920, type=GPGME_EVENT_DONE, engine=<optimized out>) at /usr/src/debug/gpgme-1.15.1-6.fc36.x86_64/src/engine.c:1064
#12 _gpgme_cancel_with_err (ctx=ctx@entry=0x7f67f8004df0, ctx_err=ctx_err@entry=117440611, op_err=op_err@entry=0) at /usr/src/debug/gpgme-1.15.1-6.fc36.x86_64/src/gpgme.c:193
#13 0x00007f6829adfba4 in gpgme_cancel (ctx=0x7f67f8004df0) at /usr/src/debug/gpgme-1.15.1-6.fc36.x86_64/src/gpgme.c:210

Ingo, werner, can you maybe look at this again, to me it just looks like gpgme_cancel should crash every time when ctx_done is called with more then one context existing, which is obviously not the case, or is it?

@ebo Can you please keep gpgsm logging enabled and log to a file, trying to reproduce the issue again. And then when the lock happens we can check with Process Explorer where the processes hang, or on which lock they are waiting and then try to find out which process id has that lock and check from the log what it is doing last. I am a bit confused that keyboxd is used here since my last observations regarding the hang were happenining of one gpgsm process got stuck that kept the pubring.lock and the system was blocked for the whole time.

I can try to reporoduce this, too with keyboxd enabled. In the past I tried to write test code to simulate the parallel import behavior but so far I have not found something. My last suspicion was that it is related to the pinentry "mark trusted" dialogs which come up when importing S/MIME cwertificates and checking their validity for the first time.

While searching for a different issue I found T6512: keyboxd with data pipe in which as I understand it a keyboxd hang is fixed but the fix in that task is not part of the stable branch and only in master. I might be misunderstanding it but just from reading the comments in T6512 this might be related.

on gpg4win-Beta-50 things look much, much better.

But after a while I managed to get a similar hanging of a gpg process.

Again, after the start of Kleopatra there are sometimes one gpg and one gpgsm process lingering (for listings).

But when I start a decryption process in that case, this hangs. It does not resume after killing the previous gpg process, but it does after killing the scdaemon process. Tried this 3 times, it was always scdaemon.
In one of the trials I am quite sure that the additional gpg/gpgsm process were missing. I believe they are only signs that scdaemon is blocking other processes, which would fit with Andre's comment https://dev.gnupg.org/T7224#189041

There were 2 smart cards connected when the issue occurred, but the encrypred data was not to any of the certificates on them.

ebo moved this task from Restricted Project Column to Restricted Project Column on the Restricted Project board.

Closing this ticket, as the test version is now obsolete and the new one much improved. I'll open a new one for the remaining issue with scdaemon when I have more information