Page MenuHome GnuPG

[DIRMNGR] Key server should be tried if passed with --keyserver, regardless of the "dead" mark
Open, NormalPublic

Description

My organization has its own HKP server, that we use to encrypt mails.

Our problem is that dirmngr can mark it "dead" for a number of reason (user connected to the wrong network, not connected to their VPN at home, not connected at all, etc). When that happens, even if our users realize their mistake and fix their network setup, they can't search or download any key, because dirmngr won't even try to connect because of the mark.

As I understand it, the dead feature is an optimization to reduce search time when multiple servers are in use; that makes sense, but it should NOT make GPG quit without doing anything.

How to reproduce:

Disconnect the host from wifi/eth, use gpg --keyserver myserver, reconnect, use gpg --keyserver myserver again.

Ways that it could be fixed:

  • Always try to connect when --keyserver is used (if I give an option to gpg, I expect him to use it, for real);
  • Ignore the dead mark when only one keyserver is configured (through options or config file);
  • Retry servers if all servers in the list are marked dead;
  • etc.

In the meantime, I'm also looking for a way to get around the problem so that my (non IT) users are not bothered by this anymore; so, is it possible to ...

  • Disable the whole "dead" feature somewhere in config ?
  • Configure the time to wait before server comes "back from the dead" ?
  • Tell GPG to spawn a non daemonized instance of dirmngr for each of its run ?
  • Tell GPG to use the legacy (pre 2.1) --keyserver handler ?

Thank you for your time on this.

Details

Version
2.2.3

Event Timeline

werner triaged this task as Normal priority.Mar 27 2018, 6:18 PM
werner edited projects, added dirmngr, Feature Request; removed Bug Report.
werner added a subscriber: werner.

You can do a

gpgconf --reload dirmngr

to reset the the liveness marks.

Thank you for your answer ! :)

Yep, I knew about --kill (a bit more violent, but same idea). Unfortunately, most of our users are not so IT friendly; they don't even know what a terminal/shell is. They're just using GPG without knowing it, through their mail client's plugin.
What would be perfect is something I can configure myself in their gpg.conf / dirmngr.conf once to get around the problem "forever".

To get back on the main subject of the ticket ... I don't want to fight on terminology here, but if I ask GPG to retrieve keys from a given server that is reachable and working properly, I expect to have my keys; if I don't, it IS a bug. The fact that the server was unreachable 2 hours ago should be irrelevant here; if it can work right now, it should.

The following post assumes that we want gpg --search to try to search; meaning that we don't want gpg to exit immediately because of the dead marks, without having sent a single network request to anyone.
The post is a bit long; sorry about that.

After a quick look at the code today, here is what happens when searching for a key from servers :

  • ...
    • ks_action_search (dirmngr/ks-action.c:142) (here it iterates over the servers given with --keyserver)
      • ks_hkp_search (dirmngr/ks-engine-hkp.c:1396)
        • make_host_part (dirmngr/ks-engine-hkp.c:1002)
          • map_host (dirmngr/ks-engine-hkp.c:453)

in the latest, here is the part that is interesting:

581   if (hi->pool)
582     {
          ...
590
591       /* If the currently selected host is now marked dead, force a
592          re-selection .  */
593       if (force_reselect)
594         hi->poolidx = -1;
595       else if (hi->poolidx >= 0 && hi->poolidx < hosttable_size
596                && hosttable[hi->poolidx] && hosttable[hi->poolidx]->dead)
597         hi->poolidx = -1;
598 
599       /* Select a host if needed.  */
600       if (hi->poolidx == -1)
601         {
602           hi->poolidx = select_random_host (hi);
603           if (hi->poolidx == -1)
604             {
605               log_error ("no alive host found in pool '%s'\n", name);
                  ...
611               return gpg_error (GPG_ERR_NO_KEYSERVER);
612             }
613         }
614 
615       assert (hi->poolidx >= 0 && hi->poolidx < hosttable_size);
616       hi = hosttable[hi->poolidx];
617       assert (hi);
618     }
619   else if (r_httphost && is_ip_address (hi->name))
620     {
        ... (DNS resolution stuff, irrelevant for this subject)
646     }

648   if (hi->dead)
649     {
650       log_error ("host '%s' marked as dead\n", hi->name);
          ...
656       return gpg_error (GPG_ERR_NO_KEYSERVER);
657     }

So basically, for each URI given with --keyserver, ks_action_search calls map_host, which decide what "real host" to use (in case of a pool), and if we should try to query that host (depending on its dead mark). If the host is dead, or if all the hosts of a pool are, map_host returns a GPG_ERR_NO_KEYSERVER that will be propagated back to ks_action_search.
The thing is :

  • if we only gave one URI to --keyserver, it will just exit right now (without having actually searched anywhere);
  • even if we gave a list of URI, the current implementation of ks_action_search will exit once it got GPG_ERR_NO_KEYSERVER; so if the first server of the list if marked dead, it won't even try the other ones (for details, see the FIXME at ks-action.c:154);

IMHO, the block should be patched as:

@602
          hi->poolidx = select_random_host (hi);
            if (hi->poolidx == -1)
              {
                log_error ("no alive host found in pool '%s'\n", name);
-                   ...
-                return gpg_error (GPG_ERR_NO_KEYSERVER);
+               /* Try our luck with the first host of the pool */
+               hi->poolidx = hi->pool[0];
              }
          }

@648
-    if (hi->dead)
-      {
-        log_error ("host '%s' marked as dead\n", hi->name);
-           ...
-        return gpg_error (GPG_ERR_NO_KEYSERVER);
-      }

The dead mark will still be used to prefer alive hosts to dead ones for pools, but if all the hosts of a pool are dead, or if the "non-pool host" is dead, it will try again to query it, to avoid exiting without doing nothing.