keyboxd: Possible race conditions (and clean up)
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	• gniibe
	Sep 19 2024, 6:32 AM

Description

For T7224, I look around possible (root) cause of the problem.
Multiple gpg.exe and gpgsm.exe remained would be by an issue of keyboxd race condition.

I think that kbx/backend-sqlite.c needs some clean up.

Revisions and Commits

rG GnuPG
	rGa698adbb533f kbx: Fix a race condition on DATABASE_HD.
	rGb804378f183f kbx: Fix a race condition on DATABASE_HD.

Related Objects
Search...

		Status	Assigned	Task
		Resolved	• aheinecke	T7224 Kleopatra: broken in Testversion beta-41
		Resolved	• gniibe	T7294 keyboxd: Possible race conditions (and clean up)

Event Timeline

• gniibe triaged this task as High priority.Sep 19 2024, 6:32 AM

• gniibe created this task.

I found one problem. This problem may result lock-up on Windows, I suppose.

The variable database_hd is a shared object among multiple threads, which needs to be protected.

Here is the patch to focus the issue:

diff --git a/kbx/backend-sqlite.c b/kbx/backend-sqlite.c
index 2398aa77f..4c67c3ef7 100644
--- a/kbx/backend-sqlite.c
+++ b/kbx/backend-sqlite.c
@@ -568,11 +568,14 @@ create_or_open_database (ctrl_t ctrl, const char *filename)
   int dbversion;
   int setdbversion = 0;
 
-  if (database_hd)
-    return 0;  /* Already initialized.  */
-
   acquire_mutex ();
 
+  if (database_hd)
+    {
+      release_mutex ();
+      return 0;  /* Already initialized.  */
+    }
+
   /* To avoid races with other temporary instances of keyboxd trying
    * to create or update the database, we run the database with a lock
    * file held. */

The possible race condition is:
(1) one thread get a request from gpg for SEARCH, and goes into create_or_open_database.
(2) it goes to npth_mutex_lock and context switch there.
(3) another thread get a request from gpg for SEARCH, and it goes into create_or_open_database.
(4) two threads are now race on database_hd.
(5) one thead takes the dotlock.
(6) another thread cannot take the dotlock while timeout.
(7) after the timeout, it wrongly release the lock and call dotlock_destroy.
(8) something wrong may occur here.

With a single keyboxd process, dotlock is a kind of overkill here, so clean up is needed.

• gniibe added a commit: rGb804378f183f: kbx: Fix a race condition on DATABASE_HD..Sep 19 2024, 6:47 AM

I applied rGb804378f183f: kbx: Fix a race condition on DATABASE_HD. in master. Let us see how behavior changes.

Sounds very reasonable. Maybe the initial idea was to open the database directly after keyboxd start and before and connections are accepted. My usual try to optimize a mutex away - I should not do this.

Given that create_or_open_database is called from just one place and that will then take the mutex again; it might be easier to move the mutex stuff out of the function and declare that the function must be called with the mutex held.

• werner added a commit: rGa698adbb533f: kbx: Fix a race condition on DATABASE_HD..Sep 19 2024, 4:31 PM

Things look good on Windows. A quick test using gnupg24 with backported patches did not show any hangs. More testing will follow next week.

• werner mentioned this in T7030: Release GnuPG 2.4.6.Oct 21 2024, 3:29 PM

• werner mentioned this in T7289: Release GnuPG 2.5.2.Dec 5 2024, 11:47 AM

Closed after the release of 2.5.4

keyboxd: Possible race conditions (and clean up)Closed, ResolvedPublicActions

Description

Revisions and Commits

Related ObjectsSearch...

Event Timeline

keyboxd: Possible race conditions (and clean up)
Closed, ResolvedPublic
Actions

Related Objects
Search...