I don't have one of these systems handy to test with, but if the fix in dee026d7 does what it says it does, this sounds like it's probably OK to close in my book. if there are more problems, i'm sure we can re-open it.
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Apr 25 2017
Apr 21 2017
Apr 7 2017
Apr 4 2017
Apr 3 2017
dkg: Can we close this now that 2.1.20 is out?
Mar 31 2017
Mar 30 2017
Mar 24 2017
We also have a discussion of the mailing list. It does currently not make sense
to continue here.
The problem of NFS mounted home directories is _real_ and we have a solution for
this which is better than the old redirection hack.
The problem with too long socket names is not severe and has been around for
decades (for other software and 14 years for GnuPG). There are workaround and
/run/user also solves this.
I proposed a change which does not even require --create-socketdir. There was
no comment on this and thus I will push that now so that we can do a real life test.
Justus: I told you several times that we are not going to change working code
for no good reason.
Except that it is not working. If it was working, then
06f1f163e96f1039304fd3cf565cf9de1ca45849
would not be necessary.
Even if your hack (I call it a hack because it does not
work with getsockname)
1/ Yes it does. It returns precisely the path that was used in bind.
2/ We only use getsockname on sockets that were given us by a service manager
like systemd, and thus those sockets would be unaffected by "the hack".
would make it, it does not solve the major problem: The
inability of creating sockets on certain file systems. THAT is the major reason
why we moved to /var/run.
Please stop conflating these things. This bug is about "dirmngr and gpg-agent
should work automatically even when GNUPGHOME is larger than sun_path". It is
not about NFS or FAT or something.
Mar 21 2017
Justus: I told you several times that we are not going to change working code
for no good reason. Even if your hack (I call it a hack because it does not
work with getsockname) would make it, it does not solve the major problem: The
inability of creating sockets on certain file systems. THAT is the major reason
why we moved to /var/run.
The whole IPC thing is pretty complex and adding a non-standard hack as proposed
by Justus will for sure cause breakage on some platforms.
I'm not sure why you call it a hack. I've been looking at POSIX, [0] introduces
pathname resolution, and the terms 'relative path' and 'absolute path'.
0: http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap04.html#tag_04_13
Neither the page for connect [1], nor the one for bind [2] state that the path
used to connect/bind unix sockets must be an absolute path.
1: http: / / pubs.opengroup.org/onlinepubs/9699919799/functions/connect.html#
2: http: / / pubs.opengroup.org/onlinepubs/9699919799/functions/bind.html#
Furthermore, my test across a wide range of UNIX implementations did not show
any issues with using relative paths.
Fixed in 88f1505f0613894d5544290a170119eb538921e5.
Mar 20 2017
- Werner Koch via BTS <gnupg@bugs.g10code.com> [20170317 12:57]:
Fixed with commit 69c521d.
You can reconfigure your server. Thanks.
Mar 17 2017
Fixed with commit 69c521d.
You can reconfigure your server. Thanks.
- Werner Koch via BTS <gnupg@bugs.g10code.com> [20170316 21:12]:
What is this Apache thing ;-). Frankly, I don't have one running and it would
be easier if you can remove it from testkolab.
Mar 16 2017
What is this Apache thing ;-). Frankly, I don't have one running and it would
be easier if you can remove it from testkolab. The current Windows versions
should not have the problem anyway because warning alerts are skipped in ntbtls.
For gnutls I have a fix ready.
- Werner Koch via BTS <gnupg@bugs.g10code.com> [20170316 14:37]:
Thomas: Is there any way how I can reproduce this now that you changed the
configuration of testkolab?
I was able to reproduce it again. Maybe this bug depends on which keyserver in
the pool answers. The error is the same for Tor and non-Tor connections.
Thomas: Is there any way how I can reproduce this now that you changed the
configuration of testkolab?
I don't know why, it is not repdroducible anymore.
Mar 14 2017
I agreed in T2964 (wk on Mar 01 2017, 07:31 AM / Roundup) to auto create socket directories. I would like to do that
only for a tmpfs but we can also try to do this always. Adding a inotify watch
to remove the directory is more complex and I am not sure whether this is really
needed. The other thing is simple and we could do that for 2.1.20.
The whole IPC thing is pretty complex and adding a non-standard hack as proposed
by Justus will for sure cause breakage on some platforms.
Yes, we should document /var/run recommendations in the README. I will do that
for the next release.
This seems to be a bug in our new resolver library. I have contacted the author
for assistance.
This bug report simply asks to solve the generic problem of GNUPGHOME being
larger than sun_path. Justus's proposed mechanism is only one way of solving
that problem.
Another proposed mechanism is what i originally proposed in T2964 (dkg on Feb 17 2017, 01:52 AM / Roundup), which
*does* address remote filesystems and re-mounted filesystems.
I don't undertstand the critique about the code not yet being mature. Code
doesn't become mature by not being written, it needs to be written first and
then tested in order to become mature.
Lastly, i think if we expect that /run/user/$(id -u)/ is a "simple dependency"
for building other software, we need to make that expectation explicit someplace
reasonable (e.g. doc/HACKING or something similar)
Mar 13 2017
#2991 is a duplicate of this issue.
This is a duplicate of #2990.
Hey :-)
Glad to see I'm not the only one ;-)
Indeed, I can reproduce this.
PS: Hi flokli :)
Mar 10 2017
And failing with IPv6 nameserver.
Here's running normally (not in a container) using IPv4 nameserver.
Arch Linux. The PID was due to running in a container.
What OS are you using? It looks like A Linux distro but the process id 10 is a
little bit unlikely.
Please add
verbose
debug ipc,dns
log-file /foo/bar/dirmngr.log
to dirmngr.conf, kill dirmngr (gpgconf --kill dirmngr), and retry. Show us the
log then.
Mar 9 2017
Error output:
dirmngr[9.5]: handler for fd 5 started dirmngr[9.5]: connection from process 10 (1000:1000) dirmngr[9.5]: command 'KS_GET' failed: Server indicated a failure <Unspecified source> gpg: keyserver receive failed: Server indicated a failure dirmngr[9.5]: handler for fd 5 terminated
Mar 6 2017
My main reasons why I don't want to consider this now are:
- That code is not written and thus will not be matured.
- It does not solve the major problem why we moved to /var/run, namely remote file systems and avoidance of possible re-mounted file systems
- The claim that /var/run/user does not exists is not valid, because that is a simple dependency for building the software or using it with non-common setups (remot, long $HOME). Thus an admin will anyway be on duty and adding a few lines to /etc/rc.local is not a bug deal.
FWIW, we may try this in 2.3 see T2987.
Werner does not think that this is a problem and does not want me to spend time
on this.
getsockname is only used to recover the paths of sockets bound by a supervisor
like systemd. So unless systemd starts doing the same trick that I propose,
there is no problem.
Mar 5 2017
2.1.19 behaves (almost) the same:
- dirmngr does ignore /etc/hosts
- dirmngr is only resolving trough dns
SRV? _pgpkey-https._tcp.keyserver.example.com. (59)
SRV? _pgpkey-https._tcp.keyserver.example.com.localdomain. (71)
A? keyserver.example.com. (40)
A? keyserver.example.com.localdomain. (52)
AAAA? keyserver.example.com. (40)
AAAA? keyserver.example.com.localdomain. (52)
A? keyserver.example.com. (40)
A? keyserver.example.com.localdomain. (52)
AAAA? keyserver.example.com. (40)
AAAA? keyserver.example.com.localdomain. (52)
The command output changed slightly:
gpg2 --debug-level guru --search-keys example.com
gpg: enabled debug flags: packet mpi crypto filter iobuf memory cache memstat trust
hashing ipc clock lookup extprog
gpg: DBG: [not enabled in the source] start
gpg: DBG: chan_3 <- # Home: /tmp/gnupg-test
gpg: DBG: chan_3 <- # Config: /tmp/gnupg-test/dirmngr.conf
gpg: DBG: chan_3 <- OK Dirmngr 2.1.19 at your service
gpg: DBG: connection to the dirmngr established
gpg: DBG: chan_3 -> GETINFO version
gpg: DBG: chan_3 <- D 2.1.19
gpg: DBG: chan_3 <- OK
gpg: DBG: chan_3 -> KS_SEARCH -- example.com
gpg: DBG: chan_3 <- ERR 167772380 No name <Dirmngr>
gpg: error searching keyserver: No name
gpg: keyserver search failed: No name
gpg: DBG: chan_3 -> BYE
gpg: DBG: [not enabled in the source] stop
gpg: random usage: poolsize=600 mixed=0 polls=0/0 added=0/0
outmix=0 getlvl1=0/0 getlvl2=0/0
gpg: secmem usage: 0/32768 bytes in 0 blocks
Mar 3 2017
Thomas confirmed this, with our workaround for the SNI problem removed the
problem still occurs. We have activated our workaround again to keep wks working
on testkolab.
I think gniibe may have posted a related patch to gnupg-devel some time ago not
to abort on non fatal GNUTLS alerts but I don't think it was applied.
This issue does not have high priority for me so I downgraded to minor bug but
it's still an issue.
Mar 2 2017
From T2833 (wk on Mar 02 2017, 07:49 PM / Roundup) I don't think the problem is resolved. Yes it works now with
gnutls and ntbtls because we fixed / changed it on our side. There were no
changes to the GnuTLS code regarding alerts afaik.
Thomas: I've assigned this now to "no-selection" if possible I would have
assigned it to you. Can you come up with a test / demo that shows that this
problem still exists. Something werner could test against?
Tried with ntbtls and gnutls - both work fine now. Given the work we did with
recent release I will close this bug now.
From what I've seen there is no variation in getsockname, it just returns
whatever path is passed to bind. I don't understand the need for getsockname
tbh, because we are the ones that bind the socket in the first place.
(The only variation seems to be that the function is broken on Hurd...).
Mar 1 2017
Justus, thanks for this work, it's great!. If we can solve the problem by doing
more clever socket(7) manipulation, that would be a big win.
How do you propose dealing with the getsockname() variations? or should we just
forbid the use of getsockname() entirely in the gnupg codebase?
dkg, I understand that GnuPG does not work with such a homedir, however, it is
not the act of creating the socket that is problematic. In fact, both
bind(2)ing and connect(2)ing is ok if one uses relative paths, as demonstrated
by the test program I have attached here.
Here is the program binding and connecting to a socket with an absolute path
length of ~10 * sizeof sockaddr_un.sun_path:
System: OpenBSD:6.0:GENERIC.MP#1992
sizeof addr.sun_path: 104
Running test with strlen (cwd): 22, name: '/tmp/test-unix-sockets/socket'
getsockname returned '/tmp/test-unix-sockets/socket', addrlen: 106
Running test with strlen (cwd): 22, name: 'socket'
getsockname returned 'socket', addrlen: 106
Running test with strlen (cwd): 126, name: 'socket'
getsockname returned 'socket', addrlen: 106
Running test with strlen (cwd): 1062, name: 'socket'
getsockname returned 'socket', addrlen: 106
This works on all Unices that I have access to. I've asked on gnupg-devel@ for
people to run it elsewhere.
I understand that '--create-socketdir' solves problems besides this one. But I
disagree with the statement that our handling of socket paths is unproblematic
because --create-socketdir solves this problem.
The --hostable option is a debugging aid and only used manually.
The nsswitch items "mymachine", "resolve", and "myhostname" are not known to
libdns but should have been skipped. "files" is the first entry and should have
delivered the result.
Fixed in cd32ebd152a522e362469ab969d91f8d49f28a60.
Seems that libdns does not pick it up /etc/hosts
Can we test whether /run is mounted on a tmpfs ?
should we assume that /run is always on a tmpfs but /var/run is a classical Unix
w/o a tmpfs? Or is it better to have a configure option.
I can imagine to agree to auto-create the directory on a tmpfs.
Yes, notmuch decided that they needed to workaround the situation anyway,
because they're in an environment that doesn't create the standard per-user
rundir. That doesn't seem like a great argument that gpg should also fail in
environments where the standard per-user rundir is available. I can demonstrate
a number of environments where gpg or its daemons will fail, but i don't think
any of them justify forcing gpg or its daemons to *also* fail when those
environments aren't present.
In answer to your nitpick, here is evidence that gpg's daemons cannot create
their sockets when the GNUPGHOME is too long:
1 dkg@alice:~$ mkdir -m 0700
/home/dkg/tmp/very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-long
0 dkg@alice:~$
GNUPGHOME=/home/dkg/tmp/very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-very-long
gpgconf --launch dirmngr
gpgconf: error running '/usr/bin/gpg-connect-agent': exit status 1
gpgconf: error running '/usr/bin/gpg-connect-agent --dirmngr NOP': General error
1 dkg@alice:~$
Feb 28 2017
Notmuch deemed --create-socketdir to be insufficient for their test suite:
https://notmuchmail.org/pipermail/notmuch/2017/024148.html
Now they create GNUPGHOMEs in /tmp. That is exactly what our test suite does.
(We also use --create-socketdir, but we don't rely on it, and indeed, on my
system it fails b/c the per-user directory is not created. Likewise on the
OpenBSD build server, and the macOS one.)
Nitpick: You wrote:
when GNUPGHOME points to a directory whose path is larger than
sockaddr_un.sun_path, daemons like gpg-agent and dirmngr cannot create their
sockets.
I don't think this is correct. I have not seen any evidence that creating the
socket is problematic.
Feb 23 2017
ntbtls support is now available in master and we will release a TLS enabled
2.1.19 installer for Windows.
Right now it is somewhat limited and does not work with some sites, notably
those which allow only ECC ciphersuites. An example for such a site is
posteo.de. Note that posteo.net sends a a bogus certifcate with rediretion to
posteo.de.
Most other sites work.
Feb 21 2017
Are you using tor? if so, is your tor daemon up and running, and actively
connecting to the outside world?
Feb 19 2017
Feb 17 2017
Thanks for these fixes! I'm not sure i understand why ptr lookups are needed
for keyserver --hosttable. Can we drop those too?
Feb 15 2017
I have fixed some things. In general PTR lookups are onow only used when you
run the 'keyserver --hosttable' command.
Feb 13 2017
Fixed with commit dee026d7.
If no DNS method is found in nsswitch.conf we now append one. Using dirmngr w/o
DNS does not work anyway thus this seems to be the best solution.
Right, the proposed chnage will not fallback to the standard resolver.
I need to modify the patch because it was too simple: Need to explicitly look
for an dns entry and append it to the list iff it is missing.
right, the configuration is not an error, but a different way of handling the
DNS lookups.
just to clarify: this change means that dirmngr will continue to use libdns in
the event of finding no understood directives in nsswitch.conf. it is *not* the
equivalent of falling back to standard-resolver. right? If that's correct,
then i agree that an extra warning is probably too much noise.
You would see that error message then with every first DNS call. My
understanding is that on systemd the unknown keywords are not an error but a
featyre of systemd-resolver(?).
looks reasonable to me, though i haven't tried it myself (my nsswitch.conf
doesn't have the initial property reported).
Perhaps there should be an additional explicit log message for the
!ld.resolv_conf->lookup[0] case since dirmngr is falling back?
Using a socket conenction would require new code. We use the standard ports
instead. Sometimes the socks5 code (and I assume also the Unix domain socket
code) takes some time to figure out whether Tor is actually running, Thus this
is not done at every request.
Doing a check for every request would also require a lot of new code because we
need to restart a connection attempt at a higher layer. Similar to HTTP 301
handling.