Page MenuHome GnuPG

keyboxd hangs on stale locks after changing hostname
Closed, ResolvedPublic

Description

We got several reports of users on Fedora 39 seeing various applications using GnuPG hanging.

Some investigation lead to the gnupg change using keyboxd by default and by the fact that users after update changed hostnames.

While I was reproducing the issue (see the comment #16), I see that keyboxd keeps waiting for a stale lock forever

keyboxd[3549]: waiting for lock (held by 2886) ...

I do not have deep knowledge of the DB and keyboxd architecture, but I think gnupg should remove stale lock (even for different hostnames), have some timeout or something to avoid infinite hangs.

Details

Event Timeline

That is a feature. Consider the case that ~/.gnupg is on network file system and thus possible in use on several boxes. Thus before we remove stale lock files we do not only compare the PID but also the hostname. Granted, this is rare but we have had such cases in the past with locks.

Does gpgconf --kill keyboxd help?

Nope, The gpgconf --kill keyboxd hangs too, if I see right, while waiting for agent:

$ strace gpgconf --kill keyboxd
[...]
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f2d74fe2a10) = 3244
wait4(3244, 0x7ffc9836e364, 0, NULL)    = ? ERESTARTSYS (To be restarted if SA_RESTART is set)

$ ps auxf | grep keyboxd
tester      3243  0.0  0.1 223080  2688 pts/0    S+   09:01   0:00  |   |               \_ gpgconf --kill keyboxd
tester      3244  0.0  0.1 223720  3072 pts/0    S+   09:01   0:00  |   |                   \_ gpg-connect-agent --no-autostart --keyboxd KILLKEYBOXD
[...]
tester      3090  0.0  0.0 225884  1548 ?        Ss   09:00   0:00  \_ keyboxd --homedir /home/tester/.gnupg --daemon


$ gdb --pid 3244
[...]
(gdb) bt
[...]
#0  0x00007ff1489867dd in recvmsg () from /lib64/libc.so.6
#1  0x00007ff148c12b57 in __assuan_recvmsg (ctx=<optimized out>, fd=3, msg=0x7ffca327c740, flags=0)
    at /usr/src/debug/libassuan-2.5.6-2.fc39.x86_64/src/system-posix.c:133
#2  0x00007ff148c10c39 in _assuan_recvmsg (flags=0, msg=0x7ffca327c740, fd=<optimized out>, ctx=0x55846ab4a960)
    at /usr/src/debug/libassuan-2.5.6-2.fc39.x86_64/src/system.c:258
#3  uds_reader (ctx=0x55846ab4a960, buf=<optimized out>, buflen=<optimized out>) at /usr/src/debug/libassuan-2.5.6-2.fc39.x86_64/src/assuan-uds.c:113
#4  0x00007ff148c10ea0 in readline (ctx=ctx@entry=0x55846ab4a960, buf=buf@entry=0x55846ab4aab0 "", buflen=buflen@entry=1002, 
    r_nread=r_nread@entry=0x7ffca327c83c, r_eof=r_eof@entry=0x55846ab4aaac) at /usr/src/debug/libassuan-2.5.6-2.fc39.x86_64/src/assuan-buffer.c:79
#5  0x00007ff148c11c3b in _assuan_read_line (ctx=ctx@entry=0x55846ab4a960) at /usr/src/debug/libassuan-2.5.6-2.fc39.x86_64/src/assuan-buffer.c:151
#6  0x00007ff148c14568 in assuan_client_read_response (ctx=ctx@entry=0x55846ab4a960, line_r=line_r@entry=0x7ffca327c950, 
    linelen_r=linelen_r@entry=0x7ffca327c94c) at /usr/src/debug/libassuan-2.5.6-2.fc39.x86_64/src/client.c:87
#7  0x00007ff148c14921 in _assuan_read_from_server (ctx=ctx@entry=0x55846ab4a960, response=response@entry=0x7ffca327c9a4, off=off@entry=0x7ffca327c9a0, 
    convey_comments=convey_comments@entry=0) at /usr/src/debug/libassuan-2.5.6-2.fc39.x86_64/src/client.c:209
#8  0x00007ff148c14aec in _assuan_connect_finalize (ctx=ctx@entry=0x55846ab4a960, fd=fd@entry=3, flags=flags@entry=1)
    at /usr/src/debug/libassuan-2.5.6-2.fc39.x86_64/src/assuan-socket-connect.c:125
#9  0x00007ff148c19824 in assuan_socket_connect (ctx=0x55846ab4a960, name=<optimized out>, name@entry=0x55846ab487a0 "/run/user/1000/gnupg/S.keyboxd", 
    server_pid=server_pid@entry=0, flags=flags@entry=1) at /usr/src/debug/libassuan-2.5.6-2.fc39.x86_64/src/assuan-socket-connect.c:343
#10 0x000055846a131ec8 in start_new_service (r_ctx=r_ctx@entry=0x7ffca327cf50, module_name_id=<optimized out>, module_name_id@entry=13, program_name=0x0, 
    session_env=session_env@entry=0x0, autostart=0, verbose=1, status_cb_arg=0x0, status_cb=0x0, debug=0, opt_lc_messages=0x0, opt_lc_ctype=0x0, 
    errsource=GPG_ERR_SOURCE_UNKNOWN) at ../common/asshelp.c:447
#11 0x000055846a12b516 in start_new_keyboxd (errsource=GPG_ERR_SOURCE_UNKNOWN, debug=0, status_cb=0x0, status_cb_arg=0x0, verbose=<optimized out>, 
    autostart=<optimized out>, keyboxd_program=<optimized out>, r_ctx=0x7ffca327cf50) at ../common/asshelp.c:622
#12 start_agent () at /usr/src/debug/gnupg2-2.4.3-4.fc39.x86_64/tools/gpg-connect-agent.c:2303
#13 main (argc=<optimized out>, argv=<optimized out>) at /usr/src/debug/gnupg2-2.4.3-4.fc39.x86_64/tools/gpg-connect-agent.c:1406

while I understand this is the ultimate design to share home directory over network file systems, this behavior makes it quite hard to debug for basic user when some desktop application using gpg/gpgme hangs for no apparent reason like this. This will most probably lead users to disabling the keyboxd to get back to the old behavior.

Okay, now we have pass the warnings down to gpg and gpgsm so the problem will be easier to analyze. We also stop trying after 10 seconds. Sample error messages:

gpg: Note: database_open 134217901 waiting for lock (held by 19775) ...
gpg: keydb_search failed: Connection timed out
gpg: error reading key: Connection timed out

Sorry for the ugly error number but more parsing seems to be inappropriate. The easiest way to test thins to to take a copy of the lockfile, modify the hostname in that file, stop keyboxd and have it restarted.

Note that the initialization (i.e. taking of the database lock) has been moved from the keyboxd startup to the first use of the database.

werner triaged this task as Normal priority.Dec 18 2023, 6:00 PM

I'd say we should not do anything about this. Stale lock files are a general problem but can be solved using admin tasks. We may provide a tool to cleanup things on request.

Better don't remove your entire ~/.gnupg - removing the *.lock files after gpgconf -K all is sufficient.

In any case we know have a too to remove stale lock files, which will be available with 2.4.4. In your case you would do:

gpgconf --unlock pubring.db

Further, warning messages are now shown if a daemon is waiting for a lock (see above).

werner claimed this task.
werner moved this task from QA to gnupg-2.4.4 on the gnupg24 board.
werner edited projects, added gnupg24 (gnupg-2.4.4); removed gnupg24.

Tested this some time ago.