D393: 914_0001-dirmngr-hkp-Avoid-potential-race-condition-when-some.patch
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Nov 23 2016
In practice, dirmngr from git master still wakes up every few seconds due to the
ldap-reaper thread, even if no connections to ldap have ever happened.
the patch dirmngr-Lazily-launch-ldap-reaper-thread.patch avoids this additional
wakeup at least for those dirmngr instances that have never used LDAP.
Nov 21 2016
Nov 20 2016
This has been changed in 2.1.16 to happen only every minute. Along with the
wakeup being done at the full second (as has been agreed upon for other
daemons), this should be more of an annoyance than a real problem.
Nov 15 2016
We're shipping these patches in debian unstable as of 2.1.15-9.
Nov 10 2016
Nov 9 2016
Nov 8 2016
I'm also seeing this behavior when there is something wrong with the reverse DNS
lookups. For example:
Nov 08 10:54:36 alice dirmngr[1714]: handler for fd 5 started
Nov 08 10:54:36 alice dirmngr[1714]: DBG: chan_5 -> # Home: /home/dkg/.gnupg
Nov 08 10:54:36 alice dirmngr[1714]: DBG: chan_5 -> # Config:
/home/dkg/.gnupg/dirmngr.conf
Nov 08 10:54:36 alice dirmngr[1714]: DBG: chan_5 -> OK Dirmngr 2.1.15 at your
service
Nov 08 10:54:36 alice dirmngr[1714]: connection from process 7623 (1000:1000)
Nov 08 10:54:36 alice dirmngr[1714]: DBG: chan_5 <- GETINFO version
Nov 08 10:54:36 alice dirmngr[1714]: DBG: chan_5 -> D 2.1.15
Nov 08 10:54:36 alice dirmngr[1714]: DBG: chan_5 -> OK
Nov 08 10:54:36 alice dirmngr[1714]: DBG: chan_5 <- KEYSERVER
Nov 08 10:54:36 alice dirmngr[1714]: DBG: chan_5 -> S KEYSERVER
hkps://hkps.pool.sks-keyservers.net
Nov 08 10:54:36 alice dirmngr[1714]: DBG: chan_5 -> OK
Nov 08 10:54:36 alice dirmngr[1714]: DBG: chan_5 <- KS_GET --
0x2E8DD26C53F1197DDF403E6118E667F1EB8AF314
Nov 08 10:54:36 alice dirmngr[1714]: DBG: gnutls:L3: ASSERT:
mpi.c[_gnutls_x509_read_uint]:246
Nov 08 10:54:36 alice dirmngr[1714]: DBG: gnutls:L5: REC[0x7f7458003000]:
Allocating epoch #0
Nov 08 10:54:36 alice dirmngr[1714]: can't connect to 'oteiza.siccegge.de':
Invalid argument
Nov 08 10:54:36 alice dirmngr[1714]: error connecting to
'https://oteiza.siccegge.de:443': Invalid argument
Nov 08 10:54:36 alice dirmngr[1714]: DBG: gnutls:L5: REC[0x7f7458003000]: Start
of epoch cleanup
Nov 08 10:54:36 alice dirmngr[1714]: DBG: gnutls:L5: REC[0x7f7458003000]: End of
epoch cleanup
Nov 08 10:54:36 alice dirmngr[1714]: DBG: gnutls:L5: REC[0x7f7458003000]: Epoch
#0 freed
Nov 08 10:54:36 alice dirmngr[1714]: command 'KS_GET' failed: Invalid argument
Nov 08 10:54:36 alice dirmngr[1714]: DBG: chan_5 -> ERR 167804976 Invalid
argument <Dirmngr>
Nov 08 10:54:36 alice dirmngr[1714]: DBG: chan_5 <- BYE
Nov 08 10:54:36 alice dirmngr[1714]: DBG: chan_5 -> OK closing connection
Nov 08 10:54:36 alice dirmngr[1714]: handler for fd 5 terminated
This appears to be because the pool included 92.43.111.21, which has a PTR of
oteiza.siccegge.de, despite the fact that oteiza.siccegge.de has no A record.
There is no reason for dirmngr to be talking to the member of the pool by its
hostname, anyway -- it should make the connection by IP address, with the TLS
SNI set to the pool name.
The TCP specs demand something different and it is not the duty of dirmngr to do
something about it. You have ths behavour with all TCP connections and that is
also what makes TCP a reliable connection.
On Linux if would be possible to reduce the intial SYN retries but that is not
portable.
For --auto-key-retrieve I already implemented a --quick parameter in gpg to
advise dirmngr to give up earlier. The dirmngr side has not been implemented,
though.
Nov 4 2016
Oct 27 2016
Oct 26 2016
I'm trying to understand this, but I'm not seeing it.
Here's the test i did. While recording all traffic from my machine on port 53
(the dns port), i ran:
GNUPGHOME=$(mktemp -d) gpg-connect-agent --dirmngr
That interactive session looked like this:
> getinfo dnsinfo OK - ADNS w/o Tor support > getinfo tor dirmngr[11713.1]: command 'GETINFO' failed: False ERR 167772416 False <Dirmngr> - Tor mode is NOT enabled > keyserver --clear OK > keyserver hkps://hkps.pool.sks-keyservers.net OK > keyserver --resolve hkps://hkps.pool.sks-keyservers.net dirmngr[11713.1]: DNS query returned an error or no records: No such domain
(nxdomain)
dirmngr[11713.1]: resolve_dns_addr for 'hkps.pool.sks-keyservers.net':
'bone.digitalis.org'
dirmngr[11713.1]: resolve_dns_addr for 'hkps.pool.sks-keyservers.net':
'ip-209-135-211-141.ragingwire.net'
dirmngr[11713.1]: resolve_dns_addr for 'hkps.pool.sks-keyservers.net':
'gpg.nebrwesleyan.edu'
dirmngr[11713.1]: resolve_dns_addr for 'hkps.pool.sks-keyservers.net':
'host-37-191-220-247.lynet.no'
dirmngr[11713.1]: resolve_dns_addr for 'hkps.pool.sks-keyservers.net':
'cryptonomicon.mit.edu'
dirmngr[11713.1]: resolve_dns_addr for 'hkps.pool.sks-keyservers.net':
'zimmerman.mayfirst.org'
dirmngr[11713.1]: resolve_dns_addr for 'hkps.pool.sks-keyservers.net':
'sks.srv.dumain.com'
dirmngr[11713.1]: resolve_dns_addr for 'hkps.pool.sks-keyservers.net':
'b4ckbone.de'
dirmngr[11713.1]: resolve_dns_addr for 'hkps.pool.sks-keyservers.net':
'sks.spodhuis.org'
dirmngr[11713.1]: resolve_dns_addr for 'hkps.pool.sks-keyservers.net':
'oteiza.siccegge.de'
S # https://cryptonomicon.mit.edu:443 OK > keyserver --hosttable S # hosttable (idx, ipv6, ipv4, dead, name, time): S # 0 hkps.pool.sks-keyservers.net S # . --> 8 1 5* 3 4 2 10 9 7 6 S # 1 4 bone.digitalis.org v4=212.12.48.27 S # 2 4 ip-209-135-211-141.ragingwire.net v4=209.135.211.141 S # 3 4 gpg.nebrwesleyan.edu v4=192.94.109.73 S # 4 4 host-37-191-220-247.lynet.no v4=37.191.220.247 S # 5 4 cryptonomicon.mit.edu v4=18.9.60.141 S # 6 4 zimmerman.mayfirst.org v4=216.66.15.2 S # 7 4 sks.srv.dumain.com v4=85.119.82.209 S # 8 4 b4ckbone.de v4=193.164.133.100 S # 9 4 sks.spodhuis.org v4=94.142.242.225 S # 10 4 oteiza.siccegge.de v4=92.43.111.21 OK >
So, the SRV lookup did indeed fail, but subsequent queries succeeded.
I've attached a pcapng file of the network traffic sent and received from the
described test.
The textual version of the traffic is:
query 0x311f SRV _hkp._tcp.hkps.pool.sks-keyservers.net query response 0x311f No such name SRV
_hkp._tcp.hkps.pool.sks-keyservers.net SOA ns2.kfwebs.net
query 0x3120 A hkps.pool.sks-keyservers.net query response 0x3120 A hkps.pool.sks-keyservers.net A 92.43.111.21 A
94.142.242.225 A 193.164.133.100 A 85.119.82.209 A 216.66.15.2 A 18.9.60.141 A
37.191.220.247 A 192.94.109.73 A 209.135.211.141 A 212.12.48.27
query 0xbd61 PTR 27.48.12.212.in-addr.arpa query response 0xbd61 PTR 27.48.12.212.in-addr.arpa PTR bone.digitalis.org query 0x384a PTR 141.211.135.209.in-addr.arpa query response 0x384a PTR 141.211.135.209.in-addr.arpa PTR
ip-209-135-211-141.ragingwire.net
query 0xb36e PTR 73.109.94.192.in-addr.arpa query response 0xb36e PTR 73.109.94.192.in-addr.arpa PTR gpg.nebrwesleyan.edu query 0xcac3 PTR 247.220.191.37.in-addr.arpa query response 0xcac3 PTR 247.220.191.37.in-addr.arpa PTR
host-37-191-220-247.lynet.no
query 0xd28b PTR 141.60.9.18.in-addr.arpa query response 0xd28b PTR 141.60.9.18.in-addr.arpa PTR cryptonomicon.mit.edu query 0x4be9 PTR 2.15.66.216.in-addr.arpa query response 0x4be9 PTR 2.15.66.216.in-addr.arpa CNAME
2.0-27.15.66.216.in-addr.arpa PTR zimmerman.mayfirst.org PTR zimmermann.mayfirst.org
query 0x823b PTR 209.82.119.85.in-addr.arpa query response 0x823b PTR 209.82.119.85.in-addr.arpa PTR sks.srv.dumain.com query 0x3b0c PTR 100.133.164.193.in-addr.arpa query response 0x3b0c PTR 100.133.164.193.in-addr.arpa PTR b4ckbone.de query 0x9600 PTR 225.242.142.94.in-addr.arpa query response 0x9600 PTR 225.242.142.94.in-addr.arpa PTR sks.spodhuis.org query 0xed36 PTR 21.111.43.92.in-addr.arpa query response 0xed36 PTR 21.111.43.92.in-addr.arpa PTR oteiza.siccegge.de
Oct 10 2016
from a question on the ML
gpg-connect-agent --dirmngr
GETINFO dnsinfo
OK - ADNS w/o Tor support
GETINFO tor
ERR 167772416 False <Dirmngr> - Tor mode is NOT enabled
Oct 9 2016
Oct 6 2016
Sep 28 2016
It is now patched in gpg4win and I think aheinecke pushed the patch also to linux.
The Bug iteself has been resolved with that patch, but is yet unreleased.
Sep 19 2016
Sep 13 2016
Spoke to Werner, it is better to do ntbtls anyway.
Timeline is: this year, hopefully earlier.
For ntbtls also see: https://wiki.gnupg.org/NTBTLS
ntbtls is a development from Werner:
https://git.gnupg.org/cgi-bin/gitweb.cgi?p=ntbtls.git;a=summary
What about using https://tls.mbed.org/? At least until ntbtls is mature?
Sep 12 2016
@werner, if you prefer ntbtls over gnutls, okay. Can you add a link to ntblts
and outline the next steps. We'd probably need tls support for the web key
directory as well, so this needs a solution.
Sep 5 2016
Jochen: I'd rather you (manually) patch the dirmngr tarball included in
gpg4win-2 and create a testinstaller and try that one out.
I found the Problem in this issue and tested that the attached patch solves the
problem, yes It would have worked on GNU/Linux as the "b" has no effect there.
Finding out since when the problem existed appears moot to me and you would have
to check in dirmngr's SVN and likely always existed.
But maybe there are additional problems (as this is imo a very exotic feature)
so it would probably make sense to test it again on Windows before preparing the
next stable Gpg4win release.
Jochen, can you please find out:
a) Does this still work on GNU/Linux?
b) Did this work with elder Gpg4win version? With binary search you
should find out qickley when this broke.
Sep 2 2016
I think there is no such keyword in GnuPG
2.1.x source code. _pgpkey-http only can be
found in GnuPG 1.4.
Sep 1 2016
How about _pgpkey-http._tcp. record?
For version 2.1.15
root@47b54eb8e5bb:~/gnupg-2.1.15# gpg2 --version
gpg (GnuPG) 2.1.15
libgcrypt 1.7.3-beta
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later https://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Home: /root/.gnupg
Supported algorithms:
Pubkey: RSA, ELG, DSA, ECDH, ECDSA, EDDSA
Cipher: IDEA, 3DES, CAST5, BLOWFISH, AES, AES192, AES256, TWOFISH,
CAMELLIA128, CAMELLIA192, CAMELLIA256
Hash: SHA1, RIPEMD160, SHA256, SHA384, SHA512, SHA224
Compression: Uncompressed, ZIP, ZLIB, BZIP2
root@47b54eb8e5bb:~/gnupg-2.1.15# gpg-connect-agent --dirmngr 'getinfo dnsinfo' /bye
OK - System resolver w/o Tor support
root@47b54eb8e5bb:~/gnupg-2.1.15# gpg2 --keyserver hkp://t1.zhsj.me --recv-keys 7DFBB2F2
gpg: keyserver receive failed: End of file
root@47b54eb8e5bb:~/gnupg-2.1.15# gpg2 --keyserver hkp://t2.zhsj.me --recv-keys 7DFBB2F2
gpg: keyserver receive failed: No keyserver available
For version 2.1.11
zsj@debian ~ $ gpg2 --version
gpg (GnuPG) 2.1.11
libgcrypt 1.7.3-beta
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Home: ~/.gnupg
Supported algorithms:
Pubkey: RSA, ELG, DSA, ECDH, ECDSA, EDDSA
Cipher: IDEA, 3DES, CAST5, BLOWFISH, AES, AES192, AES256, TWOFISH,
CAMELLIA128, CAMELLIA192, CAMELLIA256
Hash: SHA1, RIPEMD160, SHA256, SHA384, SHA512, SHA224
Compression: Uncompressed, ZIP, ZLIB, BZIP2
zsj@debian ~ $ gpg-connect-agent --dirmngr 'getinfo dnsinfo' /bye
OK - System resolver w/o Tor support
zsj@debian ~ $ gpg2 --keyserver hkp://t1.zhsj.me --recv-keys 7DFBB2F2
gpg: key 7DFBB2F2: "Shengjing Zhu <i@zhsj.me>" not changed
gpg: Total number processed: 1
gpg: unchanged: 1
zsj@debian ~ $ gpg2 --keyserver hkp://t2.zhsj.me --recv-keys 7DFBB2F2
gpg: keyserver receive failed: No keyserver available
zsj@debian ~ $
Please run
gpg-connect-agent --dirmngr 'getinfo dnsinfo' /bye
Aug 29 2016
this also affects version 2.1.15 (latest gpg4win beta) and 1.1.1 (latest gpg4win
stable)
Aug 26 2016
Okay, if this transfers line endings because of the textmode read, it will
depend on the contents of the CRL in question. This explains why the defect was
not seen in earlier testing.
And pem does not work for this (I guess and tried on a GNU system).
It is okay that pem does not work, because this is a rarely used function I think.
Aug 25 2016
Woops didn't want to submit the last message as I had already looked into it myself.
This was reproducible using libksba's t-crl-parse with our root ca's clr but not
with an example file lying next to it.
Turned out that t-crl-parse opened the file in text mode. Conversion errors then
caused an invalid (too large read). When switching to binary mode it worked as
expected.
Dirmngr used the same. I've tested that crl parsing worked with the attached patch.
Now I get:
dirmngr[780]: error fetching certificate by subject: Configuration error
dirmngr[780]: crl_parse_insert failed: Missing certificate
But I think that is a different error as I get the same one when trying to
import the CRL on an empty homedir and parsing works now.
2.1.11 is not in the latest beta. Should be 2.1.13.
For testing / reporting it is also better to download the latest version from
gnupg.org
https://gnupg.org/download/index.html
Aug 8 2016
I note that if i restart dirmngr it will just choose a new member of the pool
and that member will work.
Aug 1 2016
Jul 14 2016
Jul 11 2016
This issue still stands with 2.1.13. It may be a bug or it may be a
documentation issue but I really do need this to be investigated and resolved,
please.
Jun 24 2016
Jun 18 2016
(that last comment was with 2.1.13)
fwiw, when i'm on a network that doesn't support IPv6, i get this:
0 dkg@alice:~$ gpg --send $KEYID
gpg: sending key REDACTED to hkps://hkps.pool.sks-keyservers.net
gpg: keyserver send failed: Invalid argument
gpg: keyserver send failed: Invalid argument
2 dkg@alice:~$
in dirmngr's logs:
2016-06-17 19:30:17 dirmngr[27999.2] DBG: gnutls:L3: ASSERT: mpi.c:246
2016-06-17 19:30:17 dirmngr[27999.2] DBG: gnutls:L5: REC[0x7f61f400fc10]:
Allocating epoch #0
2016-06-17 19:30:17 dirmngr[27999.2] can't connect to '2001:ba8:1f1:f2d4::2':
Invalid argument
2016-06-17 19:30:17 dirmngr[27999.2] error connecting to
'https://[2001:ba8:1f1:f2d4::2]:443': Invalid argument
2016-06-17 19:30:17 dirmngr[27999.2] DBG: gnutls:L5: REC[0x7f61f400fc10]: Start
of epoch cleanup
2016-06-17 19:30:17 dirmngr[27999.2] DBG: gnutls:L5: REC[0x7f61f400fc10]: End of
epoch cleanup
I think this instance of dirmngr was started on a network that has both IPv4 and
IPv6.
if i do:
gpg-connect-agent --dirmngr killdirmngr /bye
and then try the --send again, it goes through fine.
Jun 15 2016
Thanks for applying them.
@bernhard
I did not change it to LDAPv3 first to be conservative regarding maximum
compatibility with the least regression risk. And because I don't have a v2 Only
server available against which I could test.
Afaik LDAPv2 vs. v3 is pretty much irrelevant for the calls Dirmngr does.
Imo once OpenLDAP client libraries change behavior to use V3 by default this
should be enough for dirmngr.
Hi,
without having checked it, I think that dirmngr should try ldapv3 first.
The 2.1 versions for sure. For the others, a fallback should be good enough.
(Would it help if I go digging into specs somewhat to back that up?)
Jun 14 2016
Thanks. I applied the two patches.
I've analyzed the Problem dirmngr_ldap failed with a Protocol Error which was
hidden because the error output used errno instead of the ldap error.
Attached patch fixes the error output.
The Protocol error was because:
"historical protocol version requested, use LDAPv3 instead"
I'm not sure if dirmngr should try LDAPv3 first and fall back to LDAPv2 but the
Patch I'll attach in the next message adds a fallback to LDAPv3 if the ldap_bind
with the default protocol leads to a protocol error.
The endless activity / failing to notice that the dirmngr_ldap has already died
after the failure I leave for someone else (another issue I guess) as I've
already failed to fix this once :-)
Jun 13 2016
@aheincke an myself observed different behaviour on two ldap server versions
(openldap).
Our new openldap server fails for old dirmngr 1.1.0 Version: 1.1.0+r347-0kk2
and 2.1.11 and master.
Internall we can still access the old ldap server for testing purposes.
Next step see on the wire what the differences could be.
Jun 3 2016
May 10 2016
May 6 2016
Duplicate of T2348
Ah nevermind. I think myself that this is nobug and current behavior is correct.
There is a mechanism for the redundant setup that we want to have already and we
need to use it instead of doing something undefined.