dirmngr and gpg-agent should work automatically even when GNUPGHOME is larger than sun_path
Open, NormalPublic

Description

when GNUPGHOME points to a directory whose path is larger than
sockaddr_un.sun_path, daemons like gpg-agent and dirmngr cannot create their
sockets.

Currently we require the manual use of gpgconf --create-socketdir to switch over
to a shortened+digested path like /run/user/$(id
-u)/gnupg/d.ejnoxi4bi8ngqjaxw8jku8wz

This causes breakage in weird corner cases that people shouldn't have to know
about. On platforms where it can work, it should Just Work.

These shortened socketdirs should also get cleaned up automatically when the
associated daemons go away.

We also should support this common workflow:

  • create a temporary GNUPGHOME for experimentation
  • when done, do: rm -rf "$GNUGHOME"
  • associated daemons all terminate

So my current proposal is:

  • daemons create the ephemeral socketdir automatically if possible.
  • clients try the ephemeral socketdir first, then fall back to in-$GNUPGHOME sockets (i think this is already the case).
  • daemons watch the $GNUPGHOME with inotify, and auto-terminate if the $GNUPGHOME itself is destroyed.
  • daemons try to rmdir() on the ephemeral socketdir on termination, failing quietly on ENOTEMPTY.

Please see discussion starting at:

https://lists.gnupg.org/pipermail/gnupg-users/2017-February/057692.html

dkg added a subscriber: dkg.
justus added a subscriber: justus.Feb 28 2017, 4:39 PM

Notmuch deemed --create-socketdir to be insufficient for their test suite:

https://notmuchmail.org/pipermail/notmuch/2017/024148.html

Now they create GNUPGHOMEs in /tmp. That is exactly what our test suite does.

(We also use --create-socketdir, but we don't rely on it, and indeed, on my
system it fails b/c the per-user directory is not created. Likewise on the
OpenBSD build server, and the macOS one.)

Nitpick: You wrote:

when GNUPGHOME points to a directory whose path is larger than
sockaddr_un.sun_path, daemons like gpg-agent and dirmngr cannot create their
sockets.

I don't think this is correct. I have not seen any evidence that creating the
socket is problematic.

dkg added a comment.Mar 1 2017, 2:02 AM

Yes, notmuch decided that they needed to workaround the situation anyway,
because they're in an environment that doesn't create the standard per-user
rundir. That doesn't seem like a great argument that gpg should also fail in
environments where the standard per-user rundir is available. I can demonstrate
a number of environments where gpg or its daemons will fail, but i don't think
any of them justify forcing gpg or its daemons to *also* fail when those
environments aren't present.

In answer to your nitpick, here is evidence that gpg's daemons cannot create
their sockets when the GNUPGHOME is too long:

1 dkg@alice:~$ mkdir -m 0700
/home/dkg/tmp/very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-long
0 dkg@alice:~$
GNUPGHOME=/home/dkg/tmp/very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-long
gpgconf --launch dirmngr
gpgconf: error running '/usr/bin/gpg-connect-agent': exit status 1
gpgconf: error running '/usr/bin/gpg-connect-agent --dirmngr NOP': General error
1 dkg@alice:~$

werner added a subscriber: werner.Mar 1 2017, 7:31 AM

Can we test whether /run is mounted on a tmpfs ?
should we assume that /run is always on a tmpfs but /var/run is a classical Unix
w/o a tmpfs? Or is it better to have a configure option.

I can imagine to agree to auto-create the directory on a tmpfs.

justus added a comment.Mar 1 2017, 3:10 PM

dkg, I understand that GnuPG does not work with such a homedir, however, it is
not the act of creating the socket that is problematic. In fact, both
bind(2)ing and connect(2)ing is ok if one uses relative paths, as demonstrated
by the test program I have attached here.

Here is the program binding and connecting to a socket with an absolute path
length of ~10 * sizeof sockaddr_un.sun_path:

System: OpenBSD:6.0:GENERIC.MP#1992
sizeof addr.sun_path: 104
Running test with strlen (cwd): 22, name: '/tmp/test-unix-sockets/socket'

getsockname returned '/tmp/test-unix-sockets/socket', addrlen: 106

Running test with strlen (cwd): 22, name: 'socket'

getsockname returned 'socket', addrlen: 106

Running test with strlen (cwd): 126, name: 'socket'

getsockname returned 'socket', addrlen: 106

Running test with strlen (cwd): 1062, name: 'socket'

  getsockname returned 'socket', addrlen: 106

This works on all Unices that I have access to. I've asked on gnupg-devel@ for
people to run it elsewhere.

I understand that '--create-socketdir' solves problems besides this one. But I
disagree with the statement that our handling of socket paths is unproblematic
because --create-socketdir solves this problem.

justus added a comment.Mar 1 2017, 3:10 PM

dkg added a comment.Mar 1 2017, 7:24 PM

Justus, thanks for this work, it's great!. If we can solve the problem by doing
more clever socket(7) manipulation, that would be a big win.

How do you propose dealing with the getsockname() variations? or should we just
forbid the use of getsockname() entirely in the gnupg codebase?

From what I've seen there is no variation in getsockname, it just returns
whatever path is passed to bind. I don't understand the need for getsockname
tbh, because we are the ones that bind the socket in the first place.

(The only variation seems to be that the function is broken on Hurd...).

getsockname is only used to recover the paths of sockets bound by a supervisor
like systemd. So unless systemd starts doing the same trick that I propose,
there is no problem.

Werner does not think that this is a problem and does not want me to spend time
on this.

My main reasons why I don't want to consider this now are:

  • That code is not written and thus will not be matured.
  • It does not solve the major problem why we moved to /var/run, namely remote file systems and avoidance of possible re-mounted file systems
  • The claim that /var/run/user does not exists is not valid, because that is a simple dependency for building the software or using it with non-common setups (remot, long $HOME). Thus an admin will anyway be on duty and adding a few lines to /etc/rc.local is not a bug deal.

FWIW, we may try this in 2.3 see T2987.

dkg added a comment.Mar 14 2017, 4:39 AM

This bug report simply asks to solve the generic problem of GNUPGHOME being
larger than sun_path. Justus's proposed mechanism is only one way of solving
that problem.

Another proposed mechanism is what i originally proposed in T2964 (dkg on Feb 17 2017, 01:52 AM / Roundup), which
*does* address remote filesystems and re-mounted filesystems.

I don't undertstand the critique about the code not yet being mature. Code
doesn't become mature by not being written, it needs to be written first and
then tested in order to become mature.

Lastly, i think if we expect that /run/user/$(id -u)/ is a "simple dependency"
for building other software, we need to make that expectation explicit someplace
reasonable (e.g. doc/HACKING or something similar)

I agreed in T2964 (wk on Mar 01 2017, 07:31 AM / Roundup) to auto create socket directories. I would like to do that
only for a tmpfs but we can also try to do this always. Adding a inotify watch
to remove the directory is more complex and I am not sure whether this is really
needed. The other thing is simple and we could do that for 2.1.20.

The whole IPC thing is pretty complex and adding a non-standard hack as proposed
by Justus will for sure cause breakage on some platforms.

Yes, we should document /var/run recommendations in the README. I will do that
for the next release.

The whole IPC thing is pretty complex and adding a non-standard hack as proposed
by Justus will for sure cause breakage on some platforms.

I'm not sure why you call it a hack. I've been looking at POSIX, [0] introduces
pathname resolution, and the terms 'relative path' and 'absolute path'.

0: http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap04.html#tag_04_13

Neither the page for connect [1], nor the one for bind [2] state that the path
used to connect/bind unix sockets must be an absolute path.

1: http: / / pubs.opengroup.org/onlinepubs/9699919799/functions/connect.html#
2: http: / / pubs.opengroup.org/onlinepubs/9699919799/functions/bind.html#

Furthermore, my test across a wide range of UNIX implementations did not show
any issues with using relative paths.

Justus: I told you several times that we are not going to change working code
for no good reason. Even if your hack (I call it a hack because it does not
work with getsockname) would make it, it does not solve the major problem: The
inability of creating sockets on certain file systems. THAT is the major reason
why we moved to /var/run.

Justus: I told you several times that we are not going to change working code
for no good reason.

Except that it is not working. If it was working, then
06f1f163e96f1039304fd3cf565cf9de1ca45849
would not be necessary.

Even if your hack (I call it a hack because it does not
work with getsockname)

1/ Yes it does. It returns precisely the path that was used in bind.

2/ We only use getsockname on sockets that were given us by a service manager
like systemd, and thus those sockets would be unaffected by "the hack".

would make it, it does not solve the major problem: The
inability of creating sockets on certain file systems. THAT is the major reason
why we moved to /var/run.

Please stop conflating these things. This bug is about "dirmngr and gpg-agent
should work automatically even when GNUPGHOME is larger than sun_path". It is
not about NFS or FAT or something.

We also have a discussion of the mailing list. It does currently not make sense
to continue here.

The problem of NFS mounted home directories is _real_ and we have a solution for
this which is better than the old redirection hack.

The problem with too long socket names is not severe and has been around for
decades (for other software and 14 years for GnuPG). There are workaround and
/run/user also solves this.

I proposed a change which does not even require --create-socketdir. There was
no comment on this and thus I will push that now so that we can do a real life test.

Automatic creation of socket directories creates cleanup trouble for projects previously relying on the agent-shutdown if $GNUPGHOME is removed: https://notmuchmail.org/pipermail/notmuch/2017/024550.html

justus moved this task from Backlog to Deferred on the gnupg (gpg22) board.May 24 2017, 1:29 PM
gniibe claimed this task.Sep 7 2017, 12:35 AM
jegrp added a subscriber: jegrp.Jan 21 2019, 4:16 PM