Page MenuHome GnuPG

dirmngr sometimes hangs (rt#5740)
Closed, ResolvedPublic

Description

When dirmngr is configured to use an LDAP proxy, it apparently doesn't honor the
LDAP timeout value. I tested this with Kontact from enterprise 35 branch (rev.
914162), gpgme 1.1.6 and dirmngr 1.0.2-svn293 with the following dirmngr
configuration:

ldap-proxy:0:0:use HOST for LDAP queries:1:1:HOST:::"localhost%3a1389
only-ldap-proxy:0:1:do not use fallback hosts with --ldap-proxy:0:0::::1
ldaptimeout:16:0:set LDAP timeout to N seconds:3:3:N:100::10

And a make-shift proxy like this:

nc -l -p 1389 -c "sleep 240; nc -o dirmngr-ldap-dump ca.intevation.de 389"

Once dirmngr tries to fetch a CRL via ldap it connects to the proxy and then all
of Kontact is blocked until the proxy finally delivers the result 4m later, even
though there's an LDAP timeout of only 10s.

Event Timeline

Back in 2004 we disabled the timeout of CRL downloads due to problems with some
CRL servers who introtuced long delays during the CRL retrieval. There is only
a general inactivity timeout of 5 minutes plus the configured timeout.

See dirmngr/src/ChangeLog entry 2004-12-13.

bernhard added a subscriber: bernhard.

What can we do about this then? There should be a configuration option
to make this problem go away, right?

Well you can configure it: That last resort timeout is 5 minutes plus the
configured timeout.

What we could do is to add some code to avoid running a second query to the same
server if we figured that the server is too slow.

What is the reason that the "last resort timeout" cannot be configured
to be faster? (Meaning below 5 minutes?)

Because we decided that 5 minutes are reasonable value timeout value for an
intermittently slow CRL server. I can't see the actually problem with it: The
proxy should not delay or buffer the request at all. CRLs are picked up
automagically an you don't have control over what CRL server is used - if there
is a slow one and the timeout is too short you will never ever be able to use a
certificate bound to that CRL server. Better to wait than to be sorry - these
are the usual tradeoffs you have to make while defining a timeout. Note
thatthere are a couple of other timeouts in a TCP network, which you can't
control either.

Is it possible to write out a message when we hit such a timeout?
Make that timeout configurable nontheless?

I changed some diagnostics to better track possible problems. You may want to
try svn rev 313.

bernhard renamed this task from dirmngr ldap doesn't timeout when using ldap proxy to dirmngr ldap doesn't timeout when using ldap proxy (rt#5740).Jul 20 2009, 11:04 AM

New log with svn rev 313 send to Werner on friday. The user had killed dirmngr.
(I am putting the internal reference in the topic.)

I have an strace log for the time before the kill, it is quite boring only
4130804 lines of
2951 select(10, [9], NULL, NULL, {0, 0}) = 1 (in [9], left {0, 0})
2951 read(9, ""..., 8192) = 0
2951 fcntl64(9, F_GETFL) = 0x2 (flags O_RDWR)
2951 fcntl64(9, F_GETFL) = 0x2 (flags O_RDWR)
2951 select(10, [9], NULL, NULL, {0, 0}) = 1 (in [9], left {0, 0})
2951 read(9, ""..., 8192) = 0
2951 fcntl64(9, F_GETFL) = 0x2 (flags O_RDWR)
2951 fcntl64(9, F_GETFL) = 0x2 (flags O_RDWR)
2951 select(10, [9], NULL, NULL, {0, 0}) = 1 (in [9], left {0, 0})
2951 read(9, ""..., 8192) = 0
2951 fcntl64(9, F_GETFL) = 0x2 (flags O_RDWR)
2951 fcntl64(9, F_GETFL) = 0x2 (flags O_RDWR)

I guess I found it. The process was simply leaking file descriptors during the
LOOKUP commands. This should explain the hangings we noticed by some folks.
The fix is in svn rev 318. It boils down to this little patch:

  • ldap.c (revision 314)

+++ ldap.c (working copy)
@@ -1420,7 +1451,11 @@
{

if (context)
  {

+ ksba_reader_t reader = context->reader;
+

xfree (context->tmpbuf);
xfree (context);

+ ldap_wrapper_release_context (reader);
+ ksba_reader_release (reader);

}

}

It took quite some time to figure this because we had not enough debuging
enough. In the course of cracking down the bug I enhanced the debug support and
with that the problem was soon visible.

bernhard renamed this task from dirmngr ldap doesn't timeout when using ldap proxy (rt#5740) to dirmngr sometimes hangs (rt#5740).Aug 26 2009, 12:42 PM
bernhard closed this task as Resolved.
bernhard raised the priority of this task from Normal to High.
bernhard removed a project: Restricted Project.

Refocussing this issue on the "hang" symptom.

Our user that experienced the hangs tried dirmngr_1.0.3.svn323-0kk1
for about two weeks now without hangs. So we assume this issue to be closed.
(It would be cool to have a new dirmngr release.)

Just for completeness, there have been more fixed in dirmngr
and the symptom seems to also have happened because an http proxy
behaved
strangely.http://cvs.gnupg.org/cgi-bin/viewcvs.cgi/trunk/src/crlfetch.c?root=Dirmngr&rev=323&r1=320&r2=323