Page MenuHome GnuPG

keyboxd: Possible race conditions (and clean up)
Open, HighPublic

Description

For T7224, I look around possible (root) cause of the problem.
Multiple gpg.exe and gpgsm.exe remained would be by an issue of keyboxd race condition.

I think that kbx/backend-sqlite.c needs some clean up.

Event Timeline

gniibe created this task.

I found one problem. This problem may result lock-up on Windows, I suppose.

The variable database_hd is a shared object among multiple threads, which needs to be protected.

Here is the patch to focus the issue:

diff --git a/kbx/backend-sqlite.c b/kbx/backend-sqlite.c
index 2398aa77f..4c67c3ef7 100644
--- a/kbx/backend-sqlite.c
+++ b/kbx/backend-sqlite.c
@@ -568,11 +568,14 @@ create_or_open_database (ctrl_t ctrl, const char *filename)
   int dbversion;
   int setdbversion = 0;
 
-  if (database_hd)
-    return 0;  /* Already initialized.  */
-
   acquire_mutex ();
 
+  if (database_hd)
+    {
+      release_mutex ();
+      return 0;  /* Already initialized.  */
+    }
+
   /* To avoid races with other temporary instances of keyboxd trying
    * to create or update the database, we run the database with a lock
    * file held. */

The possible race condition is:
(1) one thread get a request from gpg for SEARCH, and goes into create_or_open_database.
(2) it goes to npth_mutex_lock and context switch there.
(3) another thread get a request from gpg for SEARCH, and it goes into create_or_open_database.
(4) two threads are now race on database_hd.
(5) one thead takes the dotlock.
(6) another thread cannot take the dotlock while timeout.
(7) after the timeout, it wrongly release the lock and call dotlock_destroy.
(8) something wrong may occur here.

With a single keyboxd process, dotlock is a kind of overkill here, so clean up is needed.

Sounds very reasonable. Maybe the initial idea was to open the database directly after keyboxd start and before and connections are accepted. My usual try to optimize a mutex away - I should not do this.

Given that create_or_open_database is called from just one place and that will then take the mutex again; it might be easier to move the mutex stuff out of the function and declare that the function must be called with the mutex held.

Things look good on Windows. A quick test using gnupg24 with backported patches did not show any hangs. More testing will follow next week.