Page MenuHome GnuPG

libgpg-error test hangs due to stream locking race condition
Closed, ResolvedPublic

Description

Deadlock condition in libgpg-error-1.21 test t-poll.
Catched on CentOS release 3.9 (Final).
Cannot reproduce on different machines.

Symptoms: t-poll hangs forever with (or without) debug output

$ ./t-poll --debug
t-lock: created pipe [3, 4]
t-lock: created pipe [5, 6]
t-lock: created pipe [7, 8]
t-lock: thread 'stdin producer' about to write
t-lock: thread 'stdin producer' about to write
t-lock: thread 'stdin producer' about to write
t-lock: thread 'stdin producer' launched (fd=4)
t-lock: thread 'stdout consumer' ready to read

(gdb) thread apply all bt

Thread 2 (Thread -1229010000 (LWP 17849)):
#0 0x0043bba1 in __read_nocancel () from /lib/tls/libpthread.so.0
#1 0x00bfa530 in es_func_fd_read (cookie=0x814e198, buffer=0x814e1d8,
size=8192) at ../../libgpg-error-1.21/src/estream.c:936
#2 0x00bfaf96 in es_fill (stream=0x814e1a8) at
../../libgpg-error-1.21/src/estream.c:1707
#3 0x00bfb9e4 in es_read_fbf (stream=0x814e1a8, buffer=0xb6bec9f3
"\b\230\212�", bytes_to_read=1, bytes_read=0xb6bec9b8)

at ../../libgpg-error-1.21/src/estream.c:2095

#4 0x00bfbc7e in es_readn (stream=0x814e1a8, buffer_arg=0xb6bec9f3,
bytes_to_read=1, bytes_read=0xb6bec9ec) at
../../libgpg-error-1.21/src/estream.c:2204
#5 0x00bfe05b in _gpgrt__getc_underflow (stream=0x814e1a8) at
../../libgpg-error-1.21/src/estream.c:3784
#6 0x00bfe451 in _gpgrt_fgets (buffer=0xb6beca75 "", length=15,
stream=0x814e1a8) at ../../libgpg-error-1.21/src/estream.c:3936
#7 0x00c03c86 in gpgrt_fgets (buffer=0xb6beca75 "", length=15,
stream=0x814e1a8) at ../../libgpg-error-1.21/src/visibility.c:467
#8 0x08048fac in consumer_thread (argaddr=0x804afa0) at
../../libgpg-error-1.21/tests/t-poll.c:108
#9 0x00436dd8 in start_thread () from /lib/tls/libpthread.so.0
#10 0x008bafca in clone () from /lib/tls/libc.so.6

Thread 1 (Thread -1218518912 (LWP 17847)):
#0 0x0043b939 in lll_mutex_lock_wait () from /lib/tls/libpthread.so.0
#1 0x00438927 in _L_mutex_lock_28 () from /lib/tls/libpthread.so.0
#2 0x00c08a98 in
JCR_LIST__ ()

from

/home/buildbot/32bit-i686-linux-gcc4.2-ora9.2-tux8.1/cs.platform.MH2/src/cs.ext.gnupg2/build_libgpg-error-1.21/src/.libs/libgpg-error.so.0
#3 0xbfff94f0 in ?? ()
#4 0xbfff93b8 in ?? ()
#5 0x00bf930c in _gpgrt_lock_lock (lockhd=0x81501ec) at
../../libgpg-error-1.21/src/posix-lock.c:174
#6 0x00bf930c in _gpgrt_lock_lock (lockhd=0x81501e8) at
../../libgpg-error-1.21/src/posix-lock.c:174
#7 0x00bf9983 in lock_stream (stream=0x814e1a8) at
../../libgpg-error-1.21/src/estream.c:382
#8 0x00bfdc32 in _gpgrt_fileno (stream=0x814e1a8) at
../../libgpg-error-1.21/src/estream.c:3555
#9 0x00c036ed in gpgrt_fileno (stream=0x814e1a8) at
../../libgpg-error-1.21/src/visibility.c:249
#10 0x08049072 in launch_thread (fnc=0x8048f6c <consumer_thread>, th=0x804afa0)
at ../../libgpg-error-1.21/tests/t-poll.c:137
#11 0x080497ce in main (argc=0, argv=0xbfff94ec) at
../../libgpg-error-1.21/tests/t-poll.c:370

  • end of bt ----

Problem is in "launch_thread" function.
Input stream is locked for reading by consumer thread immediately after launch.
Due to system specific timing conditions, function "show" argument
"es_fileno(rh->stream)" is trying to lock
the same stream.

th->stop_me = 0;
if (pthread_create (&th->thread, NULL, fnc, th))
  die ("creating thread '%s' failed: %s\n", th->name, strerror (errno));
show ("thread '%s' launched (fd=%d)\n", th->name, es_fileno (th->stream)); <--

program hangs on this line

Attached patch evaluates stream fileno before launching the thread.

Details

Version
1.21

Event Timeline

Thanks for debugging this. An alternative for your patch would be to use
es_fileno_unlocked but your idea is also fine.

Fixed with commit 7ed1502 for 1.23. I used your method.

werner claimed this task.
werner added a project: Unreleased.
werner removed a project: In Progress.