GPGME: op_verify failes for S/MIME with EBADF in multithreaded signature verification
Open, HighPublic

Description

To reproduce with GPGME tests/run-threaded:

./run-threaded --no-list --threads 10 --repeat 10 --data-type 3 sm-sig

Same tests works for me even with 40 threads or 40 repeats when verifying a PGP Message. So I don't think its a problem with the test.
I'll look further into it.

Details

Version
master
werner moved this task from Backlog to For next release on the gpgme board.Tue, Jun 4, 11:03 AM
werner claimed this task.Tue, Jun 4, 11:42 AM
werner raised the priority of this task from Normal to High.Wed, Jun 5, 9:00 PM
werner added a comment.Wed, Jun 5, 9:03 PM

Something(tm) closes an arbitrary file descriptor behind our back. Not easy to track down because strace can not trace only threads - it always wants to trace all children as well - which is a bit too much and leads to other problems.

My observation from running the verify threaded test on windows is that it does behave differently. The EBADF does not occur.

Instead after many runs with many threads we run out of filedescriptors in our fd_table. So for me it looks there that we somehow have a resource leak. But I do not see it as an important issue because with many I mean something like "after thirty runs with 50 threads".

Just noticed that due to me failing to properly understand re-entrant locks the run-thread test is broken at least on windows in that it never waits for completion. So running out of filedescriptors is to expect. I'll fix the test.

werner added a comment.Thu, Jun 6, 5:08 PM

I had to patch strace to follow threads but not forks (P8) and then when built with support for -k I tracked it down: In the inbound handler we close the fd immediately on EOF. However the upper layers don't know about it and a select fails with EBADF. Of course we could ignore the EBADF, figure out the closed fd and restart. The problem is that another thread may have opened a new oobject and that will get the last closed fd assigned - bummer.

I have a larger change for the wait code in the works. This will go into 1.14.0 but not in 1.13.1