Page MenuHome GnuPG

dirmngr should try the configured keyservers anyway even if they are all dead
Open, NormalPublic

Description

if dirmngr is explicitly configured with a keyserver -- or if it is using the default configuration -- and its host table has *all* of the configured keyservers marked as dead, for whatever reason, then it refuses to make new requests, which causes gpg --search to fail (it tells the user to (use option --keyserver), which is a separate weirdness, see T4512).

marking keyservers as "known dead" is useful when selecting among a pool where some keyservers are dead but others are not. But if *all* keyservers are known dead, and the user is requesting a network connection anyway, dirmngr should go ahead and try one of the dead ones, rather than pre-emptively failing.

Details

External Link
2.2.15

Event Timeline

This is particularly bad for users who have manually specified a given keyserver in dirmngr.conf, because even a transient failure in that keyserver will prevent them from any future keyserver requests until dirmngr decides that the "death" has worn off.

Most troublingly, this fatal failure mode might encourage some keyserver operators to try to hide transient failures to avoid this bad use case, which would make error signalling even harder to interpret.

werner added a project: Feature Request.

There is a growing bit of popular lore in the GnuPG community that "when keyserver operations fail, you solve that problem with killall dirmngr." I believe this suggestion is potentially damaging (the long-running daemon may be in the middle of operations for a client that you don't know about), but i suspect it is circulating as advice because it resolves the situation outlined in this ticket. For whatever ephemeral reason, dirmngr gets stuck, and fails to notice that this situation has resolved itself.

If we don't want users to ritualistically invoke killall dirmngr then dirmngr needs to be better about actively trying again.

I can attest to the "growing bit of popular lore": Roughly half the support requests I get to support@keys.openpgp.org boil down to an exchange of "it just doesn't work with a 'general error' message" -> "try killall dirmngr" -> "that did it". I have heard similar stories from @patrick from Enigmail users, and more than once heard people applying poweruser trickery like "I just have killall dirmngr in my resume.d".

This has been a thing for years. Please prioritize this issue, or more generally, reliability in dirmngr.

@aheinecke does this not come up from kleopatra users as well?

The proper solution is of course to use pkill instead of killall. SCNR.

@Valodim probably not so much as dirmngr might behave differently and not mark hosts as dead.

For example I have just tried with a system without internet connection dirmngr times out and runs into an unknown error. But searching on Keyservers is also in my opinion not a common use case for Kleopatra users. That is IMO more for something who are "experts" and don't use Kleopatra ;-)

But in general I agree dirmngr could be more robust and "forgiving" ;).
Werner could you maybe at least check for an internet connection, I don't know how to do it on Linux but on Windows it's easy because windows has API for that.

https://docs.microsoft.com/en-us/windows/win32/api/wininet/nf-wininet-internetgetconnectedstate

#include <wininet.h>

DWORD flags;
if (!InternetCheckConnection(&flags, 0))
  return "Booo you are not connected."

The problem is not to check whether there is a connection but on how to decide whether something is a pool or an explictly added single keyserver and how often should we try to connect or read from it. Without marking hosts as dead the auto search features won't work well.

gpgconf --reload dirmngr

flushes the internal caches. We do have a connection timeoout. In most cases the keyservers take too long to answer after a connection has been established and are the marked dead for some time.

I agree that this is a tricky problem, but it should really be improved.

Frankly speaking: I had so many support requests, that I decided to implement keyserver interaction directly in Enigmail using XmlHttpRequest(), rather than needing to help people fix dirmngr issues.

and by that bypassing all key source tracking as done by gpg. In any case searching by name or mail address on a keyserver should not be done - at least not by a GUI tool as used by non experienced users.

But searching on Keyservers is also in my opinion not a common use case for Kleopatra users.

Thanks for engaging constructively. Does Kleopatra not use keyservers to refresh keys, for updates like subkeys or revocations? It seemed to me (from e.g. T4163) that keyservers are generally a supported use case. As mentioned, I do get support requests from Kleopatra/GpgOL users every now and then.

In T4513#132770, @aheinecke wrote:

Werner could you maybe at least check for an internet connection, I don't know how to do it on Linux but on Windows it's easy because windows has API for that.

For projects willing to adopt a modern framework (not reliant on strict posix), there are certainly APIs for that on the modern F/LOSS desktop as well.

@werner wrote:

The problem is not to check whether there is a connection

however, lack of a connection does contribute to the view that some servers are "dead". For example, if i take my computer offline, but keep working on it, and some process asks dirmngr to talk to the keyservers, the keyservers will be marked as dead.

When i bring my machine back online, those keyservers will still all be marked as dead. Subsequent queries will report:

gpg: error searching keyserver: No keyserver available
gpg: keyserver search failed: No keyserver available

despite the fact that the keyserver is easily available to the rest of the operating system. Dirmngr is just refusing to connect to it because it believes it is unreachable.

This is a pretty easy test to run, and i encourage you to do it.

But searching on Keyservers is also in my opinion not a common use case for Kleopatra users.

Thanks for engaging constructively.

Is this meant ironic? I think we are engaging constructively, we see that its an issue.

Does Kleopatra not use keyservers to refresh keys, for updates like subkeys or revocations?

No such feature in Kleopatra, yes it should do this but no one has implemented that yet :-)

It seemed to me (from e.g. T4163) that keyservers are generally a supported use case. As mentioned, I do get support requests from Kleopatra/GpgOL users every now and then.

Sure, but as said on Windows this is different again, according to my tests dirmngr does not mark keyservers as dead on Windows because the error path is different. That is a bug in itself, but unrelated.

@werner To fix this issue in itself: "If all keyservers are marked as dead remove all the dead marks and try again" ? I think that this is reasonable.

Any news here? Is this issue going to be fixed or not? It's really annoying.

We are using a keyserver in our company. If someone didn't connect to vpn but tries to send an encrypted mail (and gpg tries to pull public keys) the dirmngr process is broken. Even if they recognize their mistake and connect to the vpn right after the first attempt (where the keyserver is reachable) they cannot use it for 1,5 hours?! Killing the service or gpgconf --reload dirmngr is no option for a "normal" sales guy. Could you please provide a fix for that? If I explicitly try to pull keys again, then it should work.

For the record, the typical response to "it doesn't work" support requests for keys.o.o still comes down to killall dirmngr.

I think we are engaging constructively, we see that its an issue.

Scrolling over this issue and its timeline, I don't see much evidence to support this assessment.

To fix this issue in itself: "If all keyservers are marked as dead remove all the dead marks and try again" ? I think that this is reasonable.

This sounds like a good suggestion for a straightforward fix.

Gook luck on Solaris with this suggestion ;-)

For the record: To restart dirmngr, use "gpgconf --kill dirmngr" or if you really need to kill the process use "pkill dirmngr"

Gook luck on Solaris with this suggestion ;-)

I have no experience with Solaris, so please do elaborate. Do you see any issues on Solaris specifically for unmarking keyservers as dead once they are all dead?