Concurrent auto-start of gpg-agent by multiple gpg instances.
Open, NormalPublic

Description

When the gpg-agent is down and multiple gpg decryption commands are executed in parallel, then all gpg instances try to start the gpg-agent at the same time and most of the gpg instances hang.

If the gpg-agent is up, then all concurrent decryptions are performed without any problem.

Details

Version
2.2.23.57908
werner triaged this task as Normal priority.Tue, Sep 15, 9:35 PM
werner added projects: Windows, gnupg (gpg22).
werner added a subscriber: werner.

I assume this is the Windows version. gpg uses a locking mechanism to avoid creating several gpg-agent processes. In the worst case this may take quite some time until one of the processes can get the lock. There is an exponential backoff scheme in use and I have not yet found a way to replicate the full deadlock you describe. It would be helpful if you could describe in more detail how you run into this case.

Yes it is the windows version. It occurs both in Windows 10 and Windows Server 2016.
What I notice is that a gpg-agent is started, then after some time another one is started and the previous ends (presumably because it has lost the socket), etc. At any point in time, I can see only one agent instance running in the task manager, but with different process ids.

When I start up a small number of gpg instances (lets say 5) sometimes it works fine, although with severe delays.
When I start a lot (lets say 50), a few instances complete, but the rest hang.

We have bypassed the problem by starting the gpg-agent ahead of time, so that when the gpg instances are executed, the agent is already up and running.

Please tell me if Iyou need something more specific.

We need to figure out why the file locks seem not to work. gpg-agent processes whatch there own socket and terminate if that socket does not belong to them anymore.

Stock Windows versions without any remote file system?
All work as the same user, that is the socket directory

gpgconf --list-dirs socketdir

is only used by one user or is that somehow shared with other users? But even that should not harm, we use

if (LockFileEx (h->lockhd, (LOCKFILE_EXCLUSIVE_LOCK  | LOCKFILE_FAIL_IMMEDIATELY), 0, 1, 0, &ovl))

on a dedicated file named *.lock in the GNUPGHOME dir.

CaveTheCave added a comment.EditedWed, Sep 16, 5:03 PM

From the '&ovl' I assume that the lock file has been opened for overlapped IO.
Please see an extract from MSDN for the LockFileEx function:

<If an exclusive lock is requested for a range of a file that already has a shared or exclusive lock, the function returns the error ERROR_IO_PENDING. The system will signal the event specified in the OVERLAPPED structure after the lock is granted.>

If the gpg program is signalled that the lock has been taken, then it may try to steel the lock, although the gpg-agent may have already been started by some other instance. Maybe NULL must be passed instead of the &ovl pointer.

Please note that:

  • There is a single user accessing the socket dir (which is the same as the homedir).
  • The socketdir (homedir) is not in a local directory. It is in another file system accessed via the SMB protocol, with a command such as:
gpg --homedir "//192.168.32.211/c$/gpghomedir" ...

With all respect. Should I wait for a follow-up or I should consider this case as closed?