Page MenuHome GnuPG

gpg2 hangs on many tasks on OpenIndiana (Illumos)
Closed, ResolvedPublic

Description

gnupg 2.3.2 with libgcrypt 1.9.4 on latest OpenIndiana (one of the Illumos distros that came out of the OpenSolaris project).

When running gpg2 on OpenIndiana, it appears there's some kind of issue communicating over the socket with gpg-agent.

With no gpg-agent running and GPG_TTY correctly set, if I attempt to decrypt a file that was encrypted to me on a different machine, via

$ gpg2 --version
gpg (GnuPG) 2.3.2
libgcrypt 1.9.4
Copyright (C) 2021 Free Software Foundation, Inc.
License GNU GPL-3.0-or-later <https://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Home: /faculty/ndsu/mooney/.gnupg
Supported algorithms:
Pubkey: RSA, ELG, DSA, ECDH, ECDSA, EDDSA
Cipher: IDEA, 3DES, CAST5, BLOWFISH, AES, AES192, AES256, TWOFISH,
        CAMELLIA128, CAMELLIA192, CAMELLIA256
AEAD: EAX, OCB
Hash: SHA1, RIPEMD160, SHA256, SHA384, SHA512, SHA224
Compression: Uncompressed, ZIP, ZLIB, BZIP2

$ gpg2 --decrypt /tmp/textfile.txt.asc

what happens is that the gpg-agent is started:

$ ps -ef | egrep gpg-agent
  mooney 15182     1   0 13:38:49 ?           0:00 gpg-agent --homedir /faculty/ndsu/mooney/.gnupg --use-standard-socket --daemon

but instead of the expected pinentry display, I get a blank screen, with my cursor in roughly the center of the terminal window, in what must be the location of an "invisible" pinentry box.

I can type in the invisible pinentry display, and if I correctly type my passphrase and hit enter, the invisible pinentry screen goes away and I see:

gpg: encrypted with elg1024 key, ID 100A77F8477C6C9C, created 2001-03-15
      "Tim Mooney (NDSU ITS) <Tim.Mooney@ndsu.edu>"

at which point gpg is hung.

If I attach to the running process with truss (the Solaris system call tracer, similar to Linux's strace), I see:

$ ps -ef | grep -i gpg
  mooney 15200     1  13 13:44:07 ?           2:16 gpg-agent --homedir /faculty/ndsu/mooney/.gnupg --use-standard-socket --daemon
  mooney 15198  3614   0 13:44:07 pts/5       0:00 gpg2 --decrypt /tmp/textfile.txt.asc
$ truss -f -p 15198
15198:  read(6, 0x011EF2F0, 1002)       (sleeping...)

fd 6 that it's trying to read() from appears to be the socket for communicating with gpg-agent.

While this is going on, gpg-agent starts accumulating CPU time at at rapid rate.

Whatever is happening with gpg-agent, it can't be killed normally. Trying to use 'gpgconf --kill gpg-agent' hangs, and 'kill -TERM' fails to kill it. Only 'kill -9' will actually kill it.

Note that it's not just a decrypt operation that hangs. It seems like any operation that requires communication with the gpg-agent will hang at some point.

Even though the pinentry interaction didn't correctly display anything, I don't think the problem is with pinentry. When I run 'pinentry-gkt-2' or 'pinentry-curses' directly, and issue the correct "assuan" protocol "commands" to either of the pinetry programs, when GETPIN is issued they correctly draw the curses-based dialog and they respond with the test password I entered.

It seems like the issue has something to do with communication between gnupg and gpg-agent, but I'm not certain what the issue is.

I can run additional tests, if you can provide some guidance on what debugging options or tests should be investigated.

Thanks,

Tim

Event Timeline

mooney created this object in space S1 Public.
werner added a subscriber: werner.

Just to be sure. please provide the output of

gpgconf --show-versions

and

gpgconf --list-dirs

will also be helpful. Then you should run

gpg --decrypt -v WHATEVER

which shows you information about the pinentry. Further debugging is possible by adding

debug-pinentry
debug ipc
verbose
log-file /foo/bar/agent/log

into gpg-agent.conf and restart gpg-agent.

Thanks for the guidance, Werner!

$ gpgconf --show-versions
Warning: using insecure memory!
* GnuPG 2.3.2 (3bf8d7e1b)
SunOS

* Libgcrypt 1.9.4 (05422ca2)
version:1.9.4:10904:1.42-unknown:12a00:
cc:70500:gcc:7.5.0:
ciphers:arcfour:blowfish:cast5:des:aes:twofish:serpent:rfc2268:seed:camellia:idea:salsa20:gost28147:chacha20:sm4:
pubkeys:dsa:elgamal:rsa:ecc:
digests:crc:gostr3411-94::md4:md5:rmd160:sha1:sha256:sha512:sha3:tiger:whirlpool:stribog:blake2:sm3:
rnd-mod:linux:
cpu-arch::
mpi-asm:generic/mpih-add1.c:generic/mpih-sub1.c:generic/mpih-mul1.c:generic/mpih-mul2.c:generic/mpih-mul3.c:generic/mpih-lshift.c:generic/mpih-rshift.c:
hwflist:
fips-mode:n:n:
rng-type:standard:1:2010000:2:
compliance:::

* GpgRT 1.42-unknown (0000000)

* Libassuan 2.5.5 (f8cfb56)

* KSBA 1.3.5 (?)

* GNUTLS 3.6.16
$ gpgconf --list-dirs
sysconfdir:/etc/gnupg
bindir:/usr/bin
libexecdir:/usr/bin
libdir:/usr/lib/amd64/gnupg
datadir:/usr/share/gnupg
localedir:/usr/share/locale
socketdir:/faculty/ndsu/mooney/.gnupg
dirmngr-socket:/faculty/ndsu/mooney/.gnupg/S.dirmngr
keyboxd-socket:/faculty/ndsu/mooney/.gnupg/S.keyboxd
agent-ssh-socket:/faculty/ndsu/mooney/.gnupg/S.gpg-agent.ssh
agent-extra-socket:/faculty/ndsu/mooney/.gnupg/S.gpg-agent.extra
agent-browser-socket:/faculty/ndsu/mooney/.gnupg/S.gpg-agent.browser
agent-socket:/faculty/ndsu/mooney/.gnupg/S.gpg-agent
homedir:/faculty/ndsu/mooney/.gnupg

Before adding the additional debugging config to my gpg-agent.conf, running the --decrypt with -v shows:

$ gpg --decrypt -v /tmp/textfile.txt.asc
gpg: Note: RFC4880bis features are enabled.
gpg: public key is 100A77F8477C6C9C
gpg: using subkey 100A77F8477C6C9C instead of primary key 015F6BA447576386
gpg: encrypted with elg1024 key, ID 100A77F8477C6C9C, created 2001-03-15
      "Tim Mooney (NDSU ITS) <Tim.Mooney@ndsu.edu>"
gpg: no running gpg-agent - starting '/usr/bin/gpg-agent'
gpg: waiting for the agent to come up ... (5s)
gpg: connection to the agent established
gpg: pinentry launched (17109 curses 1.1.0 /dev/pts/1 xterm -)

As I explained, that did give me the "invisible pinentry" screen, and I blindly typed my correct passphrase for my key, and then gpg hung as shown above.

Just for completeness, here's the output when I ctrl-C gpg to interrupt it:

^C
gpg: signal 2 caught ... exiting

Adding those suggested entries to ~/.gnupg/gpg-agent.conf and retrying, I get the exact same results to the screen.

I will attach the debug log from the agent. Again, the pinentry screen was "invisible" but I typed my passphrase and appear to have gotten it correct. Other than a response from gpg-agent about "Unknown option <Pinentry>" that is probably unrelated, there's no clue that I see in the log.

It seems for me that there are multiple problems.
For pinentry-curses, please have a look at: T4771: pinentry-tty/pinentry-curses interact a user as background process
It only works well in some situations; It doesn't work when the screen is occupied by foreground program like Emacs and Midnight Commander.

For pinentry-gtk-2, looking the log, it seems that your environment has XAUTHORITY but not DISPLAY envvar. pinentry-gtk-2 requires DISPLAY envvar.

Could you try with setting DISPLAY?

BTW, when pinentry interaction doesn't work well, use of --pinentry-mode loopback option for gpg may help you.

Hi gniibe!

This is when ssh'ed in to my OpenIndiana workstation, it's not when I'm sitting at the graphical console, so DISPLAY is not set in this case, nor should it be.

There is no foreground program when I'm trying to interact with gpg2. I'm using commands from a bash shell prompt, not within emacs, midnight-commander, screen, or anything else.

pinentry-gtk-2 is the system-default pinentry, but I have verified that it is correctly detecting that DISPLAY is not set and it's falling back to pinentry-curses.

As I mentioned in the initial report, I have verified separately that pinentry-curses (or pinentry-gkt-2 fallback to pinentry-curses) both work correctly when you invoke them directly and issue the appropriate assuan "commands" to them. Once GETPIN is issued, the curses-based dialog displays correctly. I did this just to test them, I understand that they are designed to be invoked by gpg-agent, not directly.

It's only when gpg-agent is involved that the problems happen. I even tried setting 'pinentry-program /usr/lib/pinentry-curses' in gpg-agent.conf, but it makes no difference.

I should have mentioned that I tried the loopback mode too. It does correctly display an "Enter password:" prompt and I can enter my passphrase, but gpg still hangs after the passphrase is entered.

Let us try to solve problems, one by one.

Firstly, (1) let us tackle the "gpg-agent hang" problem.
please show me the log when you use loopback mode.

Secondly, (2) pinentry-curses problem. In the log, you were using "TERM=xterm". Is it correctly supported on your machine (OpenIndiana)?

I think that the first problem is related to T5577: Null ptr dereference in gpg-agent (gnupg 2.3.2).
If gpg-agent has gone (after entering passphrase, it must be SEGV.

gniibe triaged this task as Normal priority.

With the options that Werner recommended for debugging in my ~/.gnupg/gpg-agent.conf:

debug-pinentry
debug ipc
verbose
log-file /tmp/gpg-agent-debug.log

and the command

$ gpg --decrypt -v --pinentry-mode loopback /tmp/textfile.txt.asc

the output to the screen is

gpg: Note: RFC4880bis features are enabled.
gpg: public key is 100A77F8477C6C9C
gpg: using subkey 100A77F8477C6C9C instead of primary key 015F6BA447576386
gpg: encrypted with elg1024 key, ID 100A77F8477C6C9C, created 2001-03-15
      "Tim Mooney (NDSU ITS) <Tim.Mooney@ndsu.edu>"
gpg: no running gpg-agent - starting '/usr/bin/gpg-agent'
gpg: waiting for the agent to come up ... (5s)
gpg: connection to the agent established

at which point it hangs.

The /tmp/gpg-agent-debug.log for that is attached.

Yes, xterm as a terminal type is correctly supported on OpenIndiana. I have been using it for many years, for both command-line and curses-based programs. It works well.

gpg-agent doesn't disappear from the process list after entering the passphrase; in fact it can't be killed with anything but 'kill -9'. 'gpgconf --kill gpg-agent' cannot kill it, the gpg-conf command just hangs when trying to.

After the passphrase has been entered and gpg hangs, gpg-agent starts to accumulate CPU time at a rapid rate, as displayed by 'ps -ef'.

After the passphrase has been entered and gpg hangs, gpg-agent starts to accumulate CPU time at a rapid rate, as displayed by 'ps -ef'.

It seems it tries to compute KEK (key encryption key), possibly forever...

Do you import your secret key? Or, is it generated on the machine?

A secret key is store encrypted by KEK, and KEK is computed by passphrase. For the computation, s2k_count is used. (S2K stands for 'String to key').
If it's too big. it takes more time.

In my environment, I can see:

$ gpg-connect-agent "GETINFO s2k_time" /bye 
D 96
OK
$ gpg-connect-agent "GETINFO s2k_count" /bye 
D 29176832
OK

It means that s2K_count is calibrated by 96ms. In this case, the computation of KEK only takes < 0.1 second.

My current keypair is old, but it's stored on my workstation's disk and appears to have been correctly imported into the private-keys-v1.d/ store. I do still have my 'secring.gpg' too, in case I ever need it for an older GPG.

When I run the gpg-connect-agent, it starts the agent and then hangs without responding with the time:

$ gpg-connect-agent "GETINFO s2k_time" /bye
gpg-connect-agent: no running gpg-agent - starting '/usr/bin/gpg-agent'
gpg-connect-agent: waiting for the agent to come up ... (5s)
gpg-connect-agent: connection to the agent established

Thanks for the info that gpg-agent is threaded. I hadn't realized that. Now that I know that, I can see that it's thread ID 3 that's
consuming CPU time, while thread ID 1 is apparently blocked:

~$ ps -L -p 19669
  PID   LWP TTY        LTIME CMD
19669     1 ?           0:00 gpg-agen
19669     3 ?          13:59 gpg-agen
$ ps -L -p 19669
  PID   LWP TTY        LTIME CMD
19669     1 ?           0:00 gpg-agen
19669     3 ?          15:52 gpg-agen

When I run the gpg-connect-agent, it starts the agent and then hangs without responding with the time:

Thank you for your testing. So, it's highly likely that the bug is around s2k thing (and clock_gettime).

You can put a line for the option s2k-count in gpg-agent.conf, like:

s2k-count 29176832

If this helps, I need to fix s2k thing for Solaris/Illumos.
For a while, please stand with this option.

s2k-count matters when you import the key.

You're definitely on the correct track: setting 's2k-count 29176832' in my gpg-agent.conf fixed the gpg-agent hang. Now the decrypt I was trying earlier works. Also, 'gpg-agent' is no longer accumulating CPU time, and I can kill it off with gpgconf.

The invisible pinentry screen is still an issue, but you appear to be correct about that too: it's an unrelated problem.

Note that right now OpenIndiana is still using pinentry-1.1.0. Now that I have a workaround for the gpg-agent issue, I'm going to update to 1.2.0 and see if pinentry has improved at all.

Thank you for locating the bug for (1).

For (2) of pinentry, it seems that curses output doesn't work well for some reason. Are you using ncurses or ucb curses + termcap?

You did all the work to locate the bug, gniibe! Nice job identifying it so quickly.

For pinentry, it's linked against the Solaris-derived curses, which is SVR4-derived and is quite good.

I have some updated information about the "invisible pinentry" issue too. I did some testing with pinentry 1.2.0 and the problem is still present.

However, I discovered that it is related to the terminal type. Even though "xterm" works well as a terminal type for me for all the other curses-based programs I use, for some reason it is not working for pinentry-curses. If I try pinentry-curses with a TERM=vt100 set, it *does* display. I'm quite surprised by this. xterm works for everything else, so I'm not certain what pinentry-curses is doing that isn't working.

By the way, do you want me to open a separate bug report for the pinentry issue and reference this bug report? When I reported the issues, I thought both of them were related to IPC with gpg-agent. You've proven they are separate issues. I will open a separate report if you prefer.

do you want me to open a separate bug report for the pinentry issue and reference this bug report?

Yes, please, that's better.

I'll keep this ticket about s2k_count and clock_gettime for Illumos.

Hi gniibe!

After you pointed out the issue with clock_gettime() and the particular timer, I reported the issue to the Illumos developers.

https://www.illumos.org/issues/14126

One of the developers, Robert Mustacchi, has implemented initial support for the necessary functionality (see the 2nd comment in the Illumos bug report). I'll be testing it with GnuPG as soon as it's available in OI's kernel.

If that works as needed for GnuPG, there would no longer be a need for any kind of workaround for current Solaris or current Illumos-derived systems.

If you have been working on a workaround, I would say you should delay that until I've had a chance to test with the new kernel implementation.

Reading pages of the following links:
https://pubs.opengroup.org/onlinepubs/9699919799/functions/clock_gettime.html
https://docs.oracle.com/cd/E36784_01/html/E36873/unistd.h-3head.html

I'm going to apply this change:

diff --git a/agent/protect.c b/agent/protect.c
index 7ae067ec7..5b345a823 100644
--- a/agent/protect.c
+++ b/agent/protect.c
@@ -110,7 +110,7 @@ calibrate_get_time (struct calibrate_time_s *data)
                    &data->creation_time, &data->exit_time,
                    &data->kernel_time, &data->user_time);
 # endif
-#elif defined (CLOCK_THREAD_CPUTIME_ID)
+#elif defined (CLOCK_THREAD_CPUTIME_ID) && _POSIX_THREAD_CPUTIME > 0
   struct timespec tmp;
 
   clock_gettime (CLOCK_THREAD_CPUTIME_ID, &tmp);

Well, by doing so, I suggest OpenIndiana developers to make sure defining _POSIX_THREAD_CPUTIME 1 in the header, when its kernel will support clock_gettime.

If _POSIX_THREAD_CPUTIME is defined already but clock_gettime returns -1 (with errno=EINVAL), please let me know the specific version
of the operating system (and how we can identify it at compile time).
Or... we would exclude Solaris by defined(__sun) && defined(__SVR4) for use of clock_gettime.

Pushed rGeeb25df6f8fc: agent: Fix calibrate_get_time use of clock_gettime..

FreeBSD has _POSIX_THREAD_CPUTIME > 0.
GNU/Linux has _POSIX_THREAD_CPUTIME == 0, because older kernel doesn't support the system call.

I mentioned the two POSIX getconf settings you referenced in those links, and the developer that implemented CLOCK_THREAD_CPUTIME_ID and a couple other CLOCK_THREAD types had this to say:

We can't set those yet as we don't currently support those clocks, which are a way of getting an arbitrary process or thread's time as opposed to the caller. I did not add support for them here. Does gpg require them?

When he says "those", he's talking about _POSIX_THREAD_CPUTIME and _POSIX_CPUTIME, since those indicate the abilility to get some other thread or processes time, rather than just the current thread or process.

So with the support he's implemented clock_gettime() now works for thread timing for your own thread, but not for others.

He's wondering if you require the ability to use clock_gettime() with other threads or processes?

Major problem here (before the change) was that clock_gettime returned an error with no valid value of the time, which confuses gpg-agent's calibration of time. This occurred on (not newest) Solaris kernel, as it offers clock_gettime function in the library and CLOCK_THREAD_CPUTIME_ID constant in the header.

The use of clock_gettime is for better precision and portability. Use of CLOCK_THREAD_CPUTIME_ID is needed (or better), as gpg-agent runs multiple threads in its process.

That's being said, I think that it's OK for OS of Solaris variants to use clock for gpg-agent, instead, if it's not yet ready for _POSIX_THREAD_CPUTIME.

Or... if there is any heuristics for Solaris to detect availability of clock_gettime with CLOCK_THREAD_CPUTIME at compile time, please let us know to merge the change.

We don't want to introduce runtime check here, if compile time detection is enough.

For GNU/Linux, clock_gettime is common these ten years, but in the past, kernel mattered.
Thus, we added heuristics for GNU/Linux.

@gniibe I have submitted D565 to change the error message on curses initialization to "Required environment variable not set"