gpg-agent.exe hanging after left to idle for a while
Closed, ResolvedPublic

Description

I've been experiencing this problem for quite some time but I didn't really want to report it because I don't have any adequate explanation for it and I find it somewhat inappropriate to burden and bother developers with yet more random, cryptic reports that hint to absolutely nothing, specially when nobody else appears to be experiencing the same thing (or so it seems from reading what's been posted over here)

It turns out that gpg-agent.exe ceases to function if you leave it unused for a while. How long? not sure exactly, but it doesn't have to be like days or anything. Last time it happened within an hour and a half I believe. Whenever gpg.exe calls for it to be executed (assuming it's not already running, of course, though I have occasionally been left with multiple gpg-agent's lingering in the background when something goes wrong) , it will work right away no sweat. But once it's been used and left to idle, it cannot be "brought back to life" or so to speak; trying to sign, encrypt, edit a key or whatever it is that needs to be done, gpg-agent.exe won't respond and all I get is the copyright message upon executing gpg.exe. During this time, gpg-agent.exe will not use any CPU at all and memory consumption remains perfectly stable at about 4MB.

One time I was actually able to "resurrect it" by enabling a specific combination of debug options along with the usual encryption/signing commands, but I could no reproduce that behavior ever again, so all I can do in order to use gpg in a more or less conventional fashion (which works fine save for the password issue which has to be retyped every time), is to manually terminate the agent so that gpg will start everything from scratch.

This may not sound like a big deal, but it's become yet another layer of complexity in the context of the automatization of certain tasks.

Tested on 32-bit WinXP SP3 with 2.2.0.

Thanks for your time and sorry for the inconveniences.

werner triaged this task as High priority.
HB1000 added a subscriber: HB1000.Sep 22 2017, 12:40 PM

Just to inform that it is not a single problem.
I recognized exactly the same behaviour.
After terminating the gpg-agent task everything works as aspected (up to the next non-activity phase).
64-bit Windows 7 Enterprise, Outlook 2010, GPG4Win Version 3.0.0-gpg4win-3.0.0.

inc75 added a subscriber: inc75.Oct 10 2017, 12:18 AM

It works correctly when installed and executed.
After a period of inactivity and with Thunderbird still open, it stops working.
With Gpg4Win 2.3.4 it works correctly.

Tested on Windows 7 Ultimate 64bit, Thunderbird 52.4.0 + Enigmail 1.9.8.3, Gpg4Win 3.0.0

Thanks for your time.

werner added a subscriber: werner.Oct 10 2017, 7:56 AM

Does anyone of you have a gpg-agent.conf and if so, what options are set?

gpgconf --list-dirs homedir

shows the directory where the *.conf files are expected. To track down the problem at least these options might be useful:

log-file /temp/my-gpg-agent.log
verbose

gpg-agent.conf actual content:

+++--- GPGConf ---+++

default-cache-ttl 1800
disable-scdaemon

+++--- GPGConf ---+++### 10/02/15 11:48:49 Mitteleuropäische Sommerzeit

  1. GPGConf edited this configuration file.
  2. It will disable options before this marked block, but it will
  3. never change anything below these lines.

Now added your suggested options ...

inc75 added a comment.Oct 10 2017, 3:29 PM

I think it might be a cleanup problem.
If you uninstall Gpg4Win 2.3.4, restart the computer, and then install Gpgp4Win 3.0.0, everything works correctly.

Gpg4Win 2.3.4

C:\>gpgconf --list-dirs homedir
sysconfdir:C%3a\ProgramData\GNU\etc\gnupg
bindir:C%3a\Program Files (x86)\GNU\GnuPG
libexecdir:C%3a\Program Files (x86)\GNU\GnuPG
libdir:C%3a\Program Files (x86)\GNU\GnuPG\lib\gnupg
datadir:C%3a\Program Files (x86)\GNU\GnuPG\share\gnupg
localedir:C%3a\Program Files (x86)\GNU\GnuPG\share\locale
dirmngr-socket:C%3a\Windows\S.dirmngr
agent-socket:C%3a\Users\INC\AppData\Roaming\gnupg\S.gpg-agent
homedir:C%3a\Users\INC\AppData\Roaming\gnupg

There is no "gpg-agent.conf" file or "gpg-agent.log" file
There is no "gpgconf.conf" file, only the "gpgconf-conf.skel" file

Gpg4Win 3.0.0

C:\>gpgconf --list-dirs homedir
C:\Users\INC\AppData\Roaming\gnupg

There is no "gpg-agent.conf" file or "gpg-agent.log" file
There is no "gpgconf.conf" file, only the "gpgconf-conf.skel" file

inc75 added a comment.Oct 10 2017, 7:24 PM

Sorry, I haven't waited long enough.
It's happened again. After leaving Thunderbird open for a while, when consulting another encrypted email, the window asking for the password does not appear and does nothing.
I need Gpg4Win, I'm going back to Gpg4Win 2.3.4

Thank you for your time and patience.


log file uploaded.

... gpg-agent hangs. After cancelling the process it works again ...

Here is a part of the log inline:

2017-10-10 10:10:19 gpg-agent[7436] Handhabungsroutine 0x3 f<FC>r den fd 324 beendet
2017-10-10 10:11:20 gpg-agent[7436] Handhabungsroutine 0x3 f<FC>r fd 408 gestartet
2017-10-10 10:11:20 gpg-agent[7436] Handhabungsroutine 0x3 f<FC>r den fd 408 beendet

< gpg-agent hangs. process cancelled by task manager >

2017-10-10 10:35:54 gpg-agent[7260] Es wird auf Socket `C:/Users/<windows-user>/AppData/Roaming/gnupg/S.gpg-agent' geh<F6>rt
2017-10-10 10:35:54 gpg-agent[7260] Es wird auf Socket `C:/Users/<windows-user>/AppData/Roaming/gnupg/S.gpg-agent.extra' geh<F6>rt
2017-10-10 10:35:54 gpg-agent[7260] Es wird auf Socket `C:/Users/<windows-user>/AppData/Roaming/gnupg/S.gpg-agent.browser' geh<F6>rt
2017-10-10 10:35:54 gpg-agent[7260] Es wird auf Socket `C:/Users/<windows-user>/AppData/Roaming/gnupg/S.gpg-agent.ssh' geh<F6>rt
2017-10-10 10:35:54 gpg-agent[7260] gpg-agent (GnuPG) 2.2.1 started 
2017-10-10 10:35:56 gpg-agent[7260] Handhabungsroutine 0x2 f<FC>r fd 32 gestartet
2017-10-10 10:35:56 gpg-agent[7260] Assuan accept problem: Input/output error
2017-10-10 10:35:56 gpg-agent[7260] Handhabungsroutine 0x2 f<FC>r den fd 32 beendet
2017-10-10 10:35:57 gpg-agent[7260] Handhabungsroutine 0x2 f<FC>r fd 252 gestartet
2017-10-10 10:35:57 gpg-agent[7260] Assuan accept problem: Input/output error

Unfortunately not verbose enough. The assuan accept problem after the first process has been killed and a new one started looks strange.

werner claimed this task.Oct 20 2017, 8:23 AM

I can replicate this now. Unfortunately without logging enabled.

I tested for several days with logging enabled but was not able to replicate it again. Then I tried again w/o logging and couldn't replicate it either.

T3480 reports the same bug and gives a hint that this is due to suspend to RAM.

werner added a comment.Nov 6 2017, 3:12 PM

Also failed to replicate on Windows-7 using a dedicated laptop.

HB1000 added a comment.Nov 8 2017, 3:32 PM

Is there a more detailed logging that i can switch on? Perhaps i can help you to get diagnostic files. Nearly every day i notice this bug. In the log (with "verbose" in gpg-agent.conf) are the same entries i already posted.

werner added a comment.Nov 8 2017, 4:57 PM

The thing is that I don't see this bug with verbose logging enabled. So we need to do more code starring or instrument the code

jbtule added a subscriber: jbtule.Nov 9 2017, 6:38 PM
jbtule added a comment.Nov 9 2017, 6:47 PM

Both my coworker and I have the same issue. We just started using gpg for git commit signing. Works the first time. Then sometime later, no window pops up and will hang git indefinitely because it's waiting on the agent. Kill the agent and gpg process let git error out. try again, gpg-agent window prompting for password shows up and works.

Windows 7 Pro 64bit, Git 2.14.1.windows.1, Gpg4win 3.0

aheinecke added a subscriber: aheinecke.

This might be a reason that we got multiple reports for Kleopatra since 3.0 was released that it hangs on keylisting: https://bugs.kde.org/show_bug.cgi?id=381910

During all the testing leading up to gpg4win 3.0 I saw this sometimes in my VM but it was never reproducible and I could not have sworn that it happened with a recent (> 2.1.15) version. But it appears that it still can occur.

gniibe added a subscriber: gniibe.Nov 20 2017, 11:50 AM

Not yet located or identified the bug, but some information.

I found possible cause by analyzing the log of my-gpg-agent.log.

In the log, fd is: 32, 248, 252, 260. 320, 324, 332, 336, 408, 416 and 420. <--- sounds big

It seems that FD_SETSIZE is 64 on MinGW.

I guess that listen fds are OK.

In libassuan/src/system-w32.c, we use select in __assuan_read and __assuan_write, but I think that it's for non blocking I/O. So, it's not for gpg-agent.

Sorry, not yet caught the bug.

I introduce GnuPG to my friend, yesterday. I saw this problem. It's on Windows 7, gpg4win 3.0.1 and enigmail.
Looking through this report, Windows 7 is common factor.

Can someone please add

disable-check-own-socket

to gpg-agent.conf to test whether this is the cause for the problem. ( note that I asked for this also in T3401)

Can someone please add

disable-check-own-socket

to gpg-agent.conf to test whether this is the cause for the problem. ( note that I asked for this also in T3401)

I assume it goes in %APPDATA%\gnupg\gpg-agent.conf. What should I do about the existing gpg-agent processes (I have 3 running right now), kill them or not

I assume it goes in %APPDATA%\gnupg\gpg-agent.conf.

Yes

What should I do about the existing gpg-agent processes (I have 3 running right now), kill them or not

Kill them. They will be started on demand if necessary and need to be restarted for the config change to take effect.

hs added a subscriber: hs.Nov 29 2017, 4:50 PM

Could confirm a similar behavior with Windows 7 and Outlook 2010 using GPG4Win 3.0.1.
Time frame for loosing the decryption ability is about one hour or more.
Setting disable-check-own-socket in gpg.conf (didn't find gpg-agent.conf) resulted in "no data" error on all
encrypted e-mails.

I added "disable-check-own-socket" to gpg-agent.conf .
Since 8 hours no "hanging".
I will watch it furthermore...

It's working for me now with that config file as well so far. I'll keep watching too.

inc75 added a comment.Nov 29 2017, 6:38 PM

I have created the file "gpg-agent.conf" in the path "C:\Users\<my user>\AppData\Roaming\gnupg\" with the following content:

debug-level guru
log-file gpg-agent.log
disable-check-own-socket

The file "gpg-agent.log" does not appear, why?

For now I'm still waiting if Gpg4Win hangs up.

Windows 7, Gpg4win 3.0.1, Thunderbird 52.5.0, Enigmail 1.9.8.3

If disable-check-own-socket can stop hanging, D454: assuan_close with nPth could be related.

Suppose a client which connects stopped task of server on Windows. In this situation, if the client blocks on closesocket, that is, some user space work is needed for server side for closing socket of client side, this bug can be explained.

hs added a comment.Nov 30 2017, 2:10 PM

Update: It was my mistake (typical beginners failure): I had to create gpg-agent.conf instead of usig gpg.conf.
Adding disable-check-own-socket resulted in the right behavior, till now:
After some time-out, GpgOL asks for password again and decrypts the content as expected.

Hence, it seems to works with that setting in Windows 7 / Outlook 2010.

Thanks everyone. I think that the problem is identified and fixed in libassuan.

My theory: Two threads having a race condition.

Bad case (rare):

Client thread:                                   Server thread:
write (sock, "GETINFO pid", ...);
                                                             read (sock, ...); // "GETINFO pid"
                                                             write (sock, "D <PID>", ...);
read (sock, ...);
closesocket (sock); // hang
                                                             read (sock, ...);

// Client side stops at closesocket to wait ack for closing request of the socket from server.
// It blocks, and the task (all threads) stops.

Lucky usual case:

Client thread:                                   Server thread:
write (sock, "GETINFO pid", ...);
                                                             read (sock, ...); // "GETINFO pid"
                                                             write (sock, "D <PID>", ...);
read (sock, ...);
                                                             read (sock, ...); // will get closing request
closesocket (sock); // no hang;  Closing request wakes up blocked-read on server and successfully gets ack from server.
werner changed the task status from Open to Testing.Dec 1 2017, 1:30 PM
werner edited projects, added Unreleased, libassuan; removed gpg4win, Windows.

Yeah, that looks correct. Good catch. The bug exhibits itself when gpg-agent checks its own socket. In this case gpg-agent is both, client and server, and due to our userland multi-threading we get blocked. The suspend/resume things makes the deadlock more likely. Note that we have the same problem on Unix.

Until we have released a new version of gnupg or gpg4win, please use the workaround of adding

disable-check-own-socket

to the file "gpg-agent.conf" and restart gpg-agent ("gpgconf --kill all")

Adding Windows again because on Unix it is unlikley that our close will block. A documented blocking behavior is only defined for STREAMS

inc75 added a comment.Dec 1 2017, 4:17 PM

The error is fixed with "disable-check-own-socket"
If someone is interested for next times, the log-file "gpg-agent.log" is on the path "C:\Users\<my user>\AppData\Local\VirtualStore\Program Files (x86)\Mozilla Thunderbird\".

Thank you to all those who have taken the time to solve it.

A new installer with an updated libassuan is now available. To download gnupg-2.2.3_171201.exe please go to https://gnupg.org/download/ . If you had the disable-check-own-socket in your gpg-agent.conf, please remove it so that we can really see whether that version fixes the problem.

Superb! Testing gnupg-2.2.3_171201.exe as I type, and it's already working past the time it would normally cease to respond :)

Ok here's an update.

I downloaded https://gnupg.org/ftp/gcrypt/binary/gnupg-w32-2.2.3_20171201.exe, which I assume is the file that was meant by "gnupg-2.2.3_171201.exe", but unfortunately it doesn't seem to fix the problem on it's own (which may potentially suggest that the new patched up libassuan is not really doing what was intended to do, at least on my system) , yet this very same version of GPG along with disable-check-own-socket works perfectly fine, which's been up and running for hours now without hanging at all.

werner changed the task status from Testing to Open.Dec 2 2017, 12:04 PM

:-(

Alright, I need to weight in with something that may possibly be influencing the failure of the December-01-2017 build to operate correctly over here; since this issue is related to sockets, and I have set up a rather unusual security apparatus on my system ("unusual" as far as computers regularly running GPG are concerned, and that only to my personal experience, meaning no reliable statistics or anything), I think it's worth mentioning that my firewall (Sygate Personal Firewall Pro) is configured to be very restrictive and that virtually anything that utilizes tcp or udp is being routed through socks5 via ProxyCap, and that neither application is currently allowing GPG to have access to any address but localhost (there's a reason for this and has got nothing to do with GPG itself, but that's part of a different discussion).

I could, given green light, put all this on halt and test GPG on this very computer without any restrictions whatsoever, but I can only do it for a limited period of time if indeed Mr.Koch and his team see something remotely interesting about this.

Best regards.

gniibe added a comment.Dec 6 2017, 6:40 AM

To reproduce this problem of nonce write->read race on Windows, and forgotten wrapping of read/write, please apply this patch for testing:


And then, please confirm that rG1524ba9656f0: agent: Set assuan system hooks before call of assuan_sock_init. can fix this, even with the patch for testing.

If you can get the developers to make a try-build that is built securely then I'd guess most of us would be happy to try it. Not all of us have a build system for gpg.

I'm doing the test. I'm currently waiting on a hang with the test change applied.

gniibe added a comment.EditedDec 6 2017, 7:53 AM

For better reproducibility of hang, this is more better:


It's a patch to libassuan. The patch to gpg-agent is not the exact one. libassuan patch is the exact one.

Sorry, I wrongly put entire gpg-agent.c, while my intention was patch.

gniibe added a comment.EditedDec 6 2017, 10:23 AM

D455: Allow change of system hooks for assuan_sock_... is a patch for libassuan.
D456: Change SOCK_CTX (internal one) system hooks is a patch for gpg-agent.

The patch above libassuan-hang-test.diff requires D455 and D456 applied.
I guess that without the patch for testing, current gpg-agent would just work fine, possibly. (no crash)

Looks good. With the libassuan-hang-test.diff and D455 D456 applied on current master branches it no longer hangs. It hung with only the libassuan-hang-test.diff.

gniibe added a comment.EditedDec 6 2017, 10:47 AM

Thanks for testing.
I created another patch which can be applied independently: D457: Avoid crash using nPth

  • libassuan-hang-test.diff + D457 and current master of GnuPG is expected no crash (hopefully)
  • D455 + D457 for libassuan, and D456 for GnuPG, this is also OK.
werner changed the task status from Open to Testing.Dec 7 2017, 6:07 PM

All commited. I created a new installer gnupg-w32-2.2.3_20171207.exe which comes with the new libassuan 2.5.1 and the two required patches for gnupg.

I've been running gnupg-w32-2.2.3_20171207.exe for about as long as it's been available and no hanging whatsoever. Thanks a lot!

(Yes, no gpg-agent config trick or anything; just a fresh install after removing every file related to the older version of gpg from my system, registry leftovers and all)

werner awarded a token.Dec 8 2017, 8:29 AM

There is now also Gpg4win-3.0.2 with that gnupg version available.

aheinecke closed this task as Resolved.Nov 9 2018, 1:42 PM

Marking this as resolved as it was forgotten in the testing state.