Page MenuHome GnuPG

Dirmngr ldap CRL checks prevent dirmngr from terminating
Closed, ResolvedPublic

Description

If you kill dirmngr after it has spawned an ldap wrapper it does not terminate
and instead consumes CPU time. This behavior occurs with and without the patch
from T1838

To reproduce this send a kill signal to the dirmngr spawned after:

GNUPGHOME=$(mktemp -d) gpgsm --verify signed-smime-test.asc

You can take the file from T1838

When you attach an strace to that pid you can see that it keeps doing:
nanosleep({0, 200000}, NULL) = 0

On the system I've noticed this that leads to dirmngr continously taking 2-3% of
the CPU's attention.

Neal: I've assigned this to you as because werner assigned 1838 also to you and
I belive that these issues are related as both are about broken cleanup behavior
in ldap_wrapper.

Details

Version
2.1

Event Timeline

Can you let me know when you can take a look at this or was my assignment wrong
here? (If so please change it)

This is a pretty major bug imho that would leave our application servers
(without manual intervention) if we would deploy 2.1 in our company. As such it
is blocking our adoption of 2.1.

I would appreciate some kind of reaction / confirmation on this issue.

How do you kill dirmngr? Using "gpgconf --kill dirmngr" or by sending a signal

  • which one?

By killing I meant sending SIGTERM (15) through the kill command.

But
"gpgconf --kill dirmngr" also does not kill the dirmngr. Is this problem not
reproducible for you?

kill -9 kills it of course.

Well, there was the ldap-reaper thread still running and due to a bug in the log
output handing code it was not able to remove its context structures and thus it
kept on spinning.

Fixed with commit 685b782.

I've compiled current master and it works for the testcase. But when I start
kleopatra and it runs the keylist/verify dirmngr now crashes.

Can be triggered with gpgme: run-keylist --validate --cms

It crashes at different points but it never gets through all my certificates.

An example of the debug output that is collected before it crashes (differs
between crashes):

2015-06-16 19:09:15 dirmngr[9303.1] no CRL available for issuer id
18F071EAAC08885C9434A7DE1DB3AFC30F27DD32
2015-06-16 19:09:15 dirmngr[9303.1] DBG: chan_1 -> INQUIRE SENDCERT
2015-06-16 19:09:15 dirmngr[9303.1] DBG: chan_1 <- [ 44 20 30 82 04 7d 30 82 03
65 a0 03 02 01 02 02 ...(982 byte(s) skipped) ]
2015-06-16 19:09:15 dirmngr[9303.1] DBG: chan_1 <- [ 44 20 14 34 6d f5 07 c2 04
86 4a ba a1 71 50 b0 ...(187 byte(s) skipped) ]
2015-06-16 19:09:15 dirmngr[9303.1] DBG: chan_1 <- END
2015-06-16 19:09:15 dirmngr[9303.1] checking distribution points
2015-06-16 19:09:15 dirmngr[9303.1] no distribution point - trying issuer name
2015-06-16 19:09:15 dirmngr[9303.1] fetching CRL from default location
2015-06-16 19:09:15 dirmngr[9303.1] ldap wrapper 10199 started (reader
0x00007f6a580337a0)
2015-06-16 19:09:15 dirmngr[9303.0] ldap wrapper 10198 ready: exitcode=1
2015-06-16 19:09:15 dirmngr[9303.0] ldap worker stati:
2015-06-16 19:09:15 dirmngr[9303.0] c=0x00007f6a58033740 pid=10199/10199
rdr=0x00007f6a580337a0 ctrl=0x00007f6a580008c0/1 la=1434474555 rdy=0
2015-06-16 19:09:15 dirmngr[9303.0] c=0x00007f6a58022520 pid=-1/10198
rdr=0x0000000000000000 ctrl=0x0000000000000000/0 la=1434474554 rdy=1
2015-06-16 19:09:15 dirmngr[9303.0] dirmngr_ldap[10199]: processing url 'ldap://'
2015-06-16 19:09:15 dirmngr[9303.0] dirmngr_ldap[10199]: host
'directory.verisign.com'
2015-06-16 19:09:15 dirmngr[9303.0] dirmngr_ldap[10199]: port 389
2015-06-16 19:09:15 dirmngr[9303.0] dirmngr_ldap[10198]: processing url 'ldap://'
2015-06-16 19:09:15 dirmngr[9303.0] dirmngr_ldap[10198]: host
'directory.verisign.com'
2015-06-16 19:09:15 dirmngr[9303.0] dirmngr_ldap[10198]: port 389
2015-06-16 19:09:15 dirmngr[9303.0] dirmngr_ldap[10199]: DN
'1.2.840.113549.1.9.1=#4865696E65636B656E40676D61696C2E636F6D,CN=Common
Name,ST=Some-State,C=DE'
2015-06-16 19:09:15 dirmngr[9303.0] dirmngr_ldap[10199]: filter
'objectClass=*'
2015-06-16 19:09:15 dirmngr[9303.0] dirmngr_ldap[10199]: attr
'certificateRevocationList'
2015-06-16 19:09:15 dirmngr[9303.0] dirmngr_ldap[10198]: DN
'1.2.840.113549.1.9.1=#4865696E65636B656E40676D61696C2E636F6D,CN=Common
Name,ST=Some-State,C=DE'
2015-06-16 19:09:15 dirmngr[9303.0] dirmngr_ldap[10198]: filter
'objectClass=*'
2015-06-16 19:09:15 dirmngr[9303.0] dirmngr_ldap[10198]: attr
'certificateRevocationList'
2015-06-16 19:09:15 dirmngr[9303.0] dirmngr_ldap[10198]: searching 'ldap://'
failed: No such object
2015-06-16 19:09:15 dirmngr[9303.0] ldap worker stati:
2015-06-16 19:09:15 dirmngr[9303.0] c=0x00007f6a58033740 pid=10199/10199
rdr=0x00007f6a580337a0 ctrl=0x00007f6a580008c0/1 la=1434474555 rdy=0
2015-06-16 19:09:15 dirmngr[9303.0] c=0x00007f6a58022520 pid=-1/10198
rdr=0x0000000000000000 ctrl=0x0000000000000000/0 la=1434474555 rdy=1
2015-06-16 19:09:15 dirmngr[9303.0] dirmngr_ldap[10199]: searching 'ldap://'
failed: No such object
2015-06-16 19:09:15 dirmngr[9303.0] ldap wrapper 10199 ready: exitcode=1
2015-06-16 19:09:15 dirmngr[9303.0] ldap worker stati:
2015-06-16 19:09:15 dirmngr[9303.0] c=0x00007f6a58033740 pid=-1/10199
rdr=0x0000000000000000 ctrl=0x00007f6a580008c0/1 la=1434474555 rdy=1

Backtrace for this log (also differs):

#0 0x00007f6a69109cc9 in GI_raise (sig=sig@entry=6) at
../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007f6a6910d0d8 in
GI_abort () at abort.c:89
#2 0x00007f6a69146394 in __libc_message (do_abort=do_abort@entry=1,

fmt=fmt@entry=0x7f6a69254b28 "*** Error in `%s': %s: 0x%s ***\n") at

../sysdeps/posix/libc_fatal.c:175
#3 0x00007f6a6915266e in malloc_printerr (ptr=<optimized out>,
str=0x7f6a69254cf0 "double free or corruption (fasttop)", action=1)

at malloc.c:4996

#4 _int_free (av=<optimized out>, p=<optimized out>, have_lock=0) at malloc.c:3840
#5 0x00007f6a69d4fd2d in ?? () from /opt/gnupg/lib/libgcrypt.so.20
#6 0x0000000000428802 in ldap_wrapper (ctrl=ctrl@entry=0x7f6a580008c0,
reader=reader@entry=0x7f6a64c05cd0,

argv=argv@entry=0x7f6a64c05a80) at ldap-wrapper.c:772

#7 0x000000000042218c in run_ldap_wrapper (ctrl=ctrl@entry=0x7f6a580008c0,
multi_mode=multi_mode@entry=0, proxy=0x0,

host=<optimized out>, port=<optimized out>, user=<optimized out>, pass=0x0, 
dn=dn@entry=0x7f6a58001d50

"1.2.840.113549.1.9.1=#4865696E65636B656E40676D61696C2E636F6D,CN=Common
Name,ST=Some-State,C=DE",

filter=filter@entry=0x44dcc1 "objectClass=*", attr=attr@entry=0x44b50a

"certificateRevocationList", url=url@entry=0x0,

reader=reader@entry=0x7f6a64c05cd0, ignore_timeout=0) at ldap.c:191

#8 0x00000000004228ea in attr_fetch_ldap (ctrl=0x7f6a580008c0,

dn=0x7f6a58001d50

"1.2.840.113549.1.9.1=#4865696E65636B656E40676D61696C2E636F6D,CN=Common
Name,ST=Some-State,C=DE",

attr=attr@entry=0x44b50a "certificateRevocationList",

reader=reader@entry=0x7f6a64c05cd0) at ldap.c:287
#9 0x0000000000414aed in crl_fetch_default (ctrl=ctrl@entry=0x7f6a580008c0,

issuer=issuer@entry=0x7f6a58001d50

"1.2.840.113549.1.9.1=#4865696E65636B656E40676D61696C2E636F6D,CN=Common
Name,ST=Some-State,C=DE", reader=reader@entry=0x7f6a64c05cd0) at crlfetch.c:319
#10 0x000000000041439d in crl_cache_reload_crl (ctrl=ctrl@entry=0x7f6a580008c0,
cert=0x7f6a58002740) at crlcache.c:2554
#11 0x000000000040e1d5 in inquire_cert_and_load_crl (ctx=0x7f6a58000950) at
server.c:589
#12 cmd_isvalid (ctx=0x7f6a58000950, line=<optimized out>) at server.c:901
#13 0x00007f6a6a23e96a in ?? () from /opt/gnupg/lib/libassuan.so.0
#14 0x00007f6a6a23ed49 in assuan_process () from /opt/gnupg/lib/libassuan.so.0
#15 0x000000000040edc7 in start_command_handler (fd=fd@entry=1) at server.c:2243
#16 0x000000000040ada5 in start_connection_thread (arg=arg@entry=0x1) at
dirmngr.c:1937
#17 0x00007f6a69908dbc in thread_start (startup_arg=<optimized out>) at npth.c:265
#18 0x00007f6a696f1182 in start_thread (arg=0x7f6a64c06700) at pthread_create.c:312
#19 0x00007f6a691cd47d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

A different backtrace:

#0 0x00007fe9e9805cc9 in GI_raise (sig=sig@entry=6) at
../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007fe9e98090d8 in
GI_abort () at abort.c:89
#2 0x00007fe9e9842394 in __libc_message (do_abort=do_abort@entry=1,

fmt=fmt@entry=0x7fe9e9950b28 "*** Error in `%s': %s: 0x%s ***\n") at

../sysdeps/posix/libc_fatal.c:175
#3 0x00007fe9e984dac2 in malloc_printerr (ptr=<optimized out>,
str=0x7fe9e994cbfc "corrupted double-linked list", action=1)

at malloc.c:4996

#4 malloc_consolidate (av=av@entry=0x7fe9d8000020) at malloc.c:4165
#5 0x00007fe9e984edf8 in _int_malloc (av=0x7fe9d8000020, bytes=1025) at
malloc.c:3423
#6 0x00007fe9e98517b0 in GI_libc_malloc (bytes=1025) at malloc.c:2891
#7 0x00007fe9ea44ad11 in ?? () from /opt/gnupg/lib/libgcrypt.so.20
#8 0x00007fe9ea44bc19 in ?? () from /opt/gnupg/lib/libgcrypt.so.20
#9 0x00007fe9ea93b284 in init_membuf (maxlen=0, initiallen=<optimized out>,
mb=0x7fe9e53018e0, ctx=0x7fe9d8000950)

at assuan-inquire.c:64

#10 assuan_inquire (ctx=ctx@entry=0x7fe9d8000950, keyword=keyword@entry=0x44774b
"SENDCERT",

r_buffer=r_buffer@entry=0x7fe9e5301d50,

r_length=r_length@entry=0x7fe9e5301d60, maxlen=maxlen@entry=0) at
assuan-inquire.c:169
#11 0x000000000040dfca in inquire_cert_and_load_crl (ctx=0x7fe9d8000950) at
server.c:567
#12 cmd_isvalid (ctx=0x7fe9d8000950, line=<optimized out>) at server.c:901
#13 0x00007fe9ea93a96a in dispatch_command (ctx=0x7fe9d8000950, line=<optimized
out>, linelen=<optimized out>)

at assuan-handler.c:675

#14 0x00007fe9ea93ad49 in process_request (ctx=0x7fe9d8000950) at
assuan-handler.c:871
#15 assuan_process (ctx=0x7fe9d8000950) at assuan-handler.c:894
#16 0x000000000040edc7 in start_command_handler (fd=fd@entry=6) at server.c:2243
#17 0x000000000040ada5 in start_connection_thread (arg=arg@entry=0x6) at
dirmngr.c:1937
#18 0x00007fe9ea004dbc in thread_start (startup_arg=<optimized out>) at npth.c:265
#19 0x00007fe9e9ded182 in start_thread (arg=0x7fe9e5302700) at pthread_create.c:312
#20 0x00007fe9e98c947d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Can you start dirmngr under valgrind?

gpgconf --kill dirmngr
valgrind --log-file=vg.log dirmngr --daemon --homedir /my/gnupg/home/dir

In valgrind it did not crash. The keylisting exited normally. But showed several
errors.

Attached is the valigrind log.

x86 or amd64 ?
libgcrypt version?
libksba version?

amd64
libgcrypt 1.6.2
libksba 1.3.4-beta1

Btw. If I roll back your commit the crashes no longer happen.

As an additional note. From checking why dirmngr takes so long in my setup I
know that I have several certificates in my keyring where the CRL is not
available. Maybe thats part of the problem.

The last fix was wrong and the reaper thread closed a reader object which was
still used by another thread. Fixed with commit c971983:

I assumed that the log_fd also has a reader object but that reader
object is used for stdout and needs to be closed by the consumer.

The real bug with the non-released ldap_wrapper control objects was
that when looping over distribution points we did not closed the used
reader object before the next iteration.  Now, the test case had more
than one DP and thus we lost one reader object.

aheinecke: Did you had a chance to test this with 2.1.7 or master?

aheinecke removed a project: Restricted Project.

yes there are no remaining problems that I can see here.

Thanks -> resolved.