Page MenuHome GnuPG

dirmngr fails to terminate on SIGTERM if an existing connection is open
Closed, WontfixPublic

Description

If a dirmngr process has an existing connection open, it fails to terminate
properly.

It does print out:

     dirmngr[28267.0]: SIGTERM received - shutting down ...

but then no termination happens.

You can replicate this from the source tree with two terminals. in terminal A:

    TEST=$(mktemp -d -m 0700)
    ./dirmngr/dirmngr --daemon --homedir $TEST
    GNUPGHOME=$TEST ./tools/gpg-connect-agent --dirmngr 'getinfo pid' /bye

and in terminal B:

    GNUPGHOME=$TEST ./tools/gpg-connect-agent --dirmngr

(leaving that connection open)

and then back in terminal A, sending a SIGTERM to the process ID:

    kill $DIRMNGR_PID

subsequent attempts to connect from other terminals with:

    GNUPGHOME=$TEST ./tools/gpg-connect-agent --dirmngr

will hang, and the only way for dirmngr to shut down is to send it another
SIGTERM when no outstanding connections are still made.

Details

Version
2.1.16

Event Timeline

dkg set Version to 2.1.16.
dkg added a subscriber: dkg.

The man pages notes:

SIGTERM

   Shuts down the process but waits until all current requests are
   fulfilled.  If the process has received 3  of these signals
   and requests are still pending, a shutdown is forced.  You may
   also use
         gpgconf --kill dirmngr
   instead of this signal

thus this is by design and identical to what gpg-agent does. IIRC, there was a
regression for some time, fixed in 2.1.16. So this fixed regression is what you
see as a bug.

However, the process should not anymore listen for new connections as soon as a
shutdown is pending. That needs to be fixed.

While looking at the problem I found a corner case related to a shutdown and
fixed that.

I also tried to close the listening socket after the first shutdown event. I
reverted that because the effect is that a client trying to connect immediately
gets a failure and then starts a new dirmngr - which is not the idea of a shutdown.

marcus added a subscriber: marcus.

It takes a couple of seconds for dirmngr to terminate after closing the last connection, maybe due to the timeout in the pselect call. Apart from that, it works as expected.

why should it wait for the timeout in the pselect call? shouldn't it be able to respond immediately to the final connection closing?

the timeout is still happening as of the current git head (6502bb0d2af5784918ebb74242fff6f0a72844bf).

I'm also confused about Werner's description of the shutdown. there seem to be several states in the current implementation:

  • A) running, listening, with no connections
  • B) running, listening, with 1 or more connections
  • C) running, listening but not accepting, with 1 or more connections
  • D) running, listening but not accepting, with 0 or more connections
  • E) not running at all

So a SIGTERM will cause these state transitions:

A→E
B→C
C→C

and a "last connection closed" will cause this state transition:

C→D

and a "once more round the pselect" timeout will cause:

D→E

From E, any request to dirmngr will spawn a new dirmngr, right? so if that's not "the idea of a shutdown" then we should get rid of the A→E transition. But that sounds absurd. As it is, states C and D are just states that introduce new error conditions that wouldn't otherwise exist.

The cleanest mental models for the user for a daemon's response to SIGTERM are that either:

  • X) the daemon will abort existing connections and shut down ASAP, or
  • Y) the daemon should shut down once no connections are active (which means it could actually be kept alive if there are a series of overlapping connections)

I can imagine that X would be considered problematic because clients now have to deal with a state where the daemon suddenly disappears. But this is true anyway (e.g. OOM-killer takes out the daemon). And if we keep states C and D that means that clients have to consider the case where the daemon appears to be there but really isn't.

Wouldn't it be better to be simpler? I don't think the fact that some corners of this complexity are documented makes it much more palatable. :(