Page MenuHome GnuPG

Intermittent ssh publickey login failure after upgrade to gnupg 2.5.x
Open, NormalPublic

Description

I have a setup with gnupg acting as ssh agent and ssh key stored on YubiKey smart card. Ever since I've upgraded gnupg from 2.4.8 to 2.5.x I've started getting rare failures with ssh login. On the SSH side all looks fine -- keys are correctly discovered and offered by client, server accepts them but authentication fails. All ssh server has to say about this is:
debug1: child_reap: preauth child 7764 for connection from <IP1> to <IP2> exited after unsuccessful auth attempt
retrying login immediately succeeds.

I wasn't sure what to blame as other packages got updated too but today I've finally arrived at "eventual reproducer" which, given enough repetitions, reproduces the issues. After downgrading gnupg back to 2.4.8, I wasn't able to reproduce the issue anymore with total of 1500 ssh connections so I guess most likely culprit is somewhere in new gnupg. The issue seems to be very time sensitive, like some race, perhaps some communication channel is closed before all data was transferred?

Somewhat problematic part of this reproducer is that single pass consists of 100 connections established within 20 seconds, partly in parallel, which makes hard to correlate specific ssh invocation with log messages. Although to be fair I don't see any errors with "debug-log advanced" both in gpg-agent.conf and scdaemon.conf.

I fully realize the issue is scarce on details what might be wrong, but I would appreciate some ideas how could I diagnose it further.

Details

Version
2.5.18

Event Timeline

werner added a subscriber: werner.

Hi!

You need to get a log form gpg-agent. Put this into ~/.gnupg/gpg-agent/conf

log-file /somewhere/gpg-agent/log
verbose

this should show any errors gpg-agent encounters. If you do not see anything, add a second verbose line. Extra debug lines should not be necessary as we only use --verbose in command.ssh.c. Make sure that systemd does not try to start another gpg-agent.

What kind of ssh keys are involved - RSA? This might be related to T7882

I've already tried with verbose which gave no errors. That's why I moved to debug logging. With double verbose I don't see anything wrong either. Excerpt from log for relevant 100 connections among which 1 failed:

$ cat gpg.log | 
    sed 's/.*gpg-agent\[[0-9]*\] //'  | # remove date, time and process id                            
    grep -v 'ssh handler .* \(started\|terminated\)' | # appears to be mostly noise wit hex address
    sort|uniq -c
     80 new connection to /usr/libexec/gnupg2/scdaemon daemon established
     20 new connection to /usr/libexec/gnupg2/scdaemon daemon established (reusing)
    100 received ssh request of length 1
    100 received ssh request of length 208
    100 received ssh request of length 748
    100 sending ssh response of length 1
    100 sending ssh response of length 281
    100 sending ssh response of length 626
    100 ssh request handler for extension (27) ready
    100 ssh request handler for extension (27) started
    100 ssh request handler for request_identities (11) ready
    100 ssh request handler for request_identities (11) started
    100 ssh request handler for sign_request (13) ready
    100 ssh request handler for sign_request (13) started
    100 ssh-agent extension 'session-bind@openssh.com' not supported
    100 ssh-agent extension 'session-bind@openssh.com' received

What kind of ssh keys are involved - RSA? This might be related to T7882

Could be related:

debug3: sign_and_send_pubkey: signing using rsa-sha2-512 SHA256:...

I will try to reproduce without this change.

Looks like indeed related to T7882. After reverting c7e0ec12609b401ea81c4851522d86eb5ec27170 I was able to make 2000 connections without any issue. Bringing the change back and retrying issue appeared within first 300.

Added some debug logging and whenever login issue occurs new logic is applied:
https://github.com/gpg/gnupg/blob/bc7c91bee521e4adf3506ca32bf34177b84ce1c5/agent/command-ssh.c#L1482

since data_n==255 while spec->keysize==256. In successful cases this logic is not involved.

Let's see whether Niibe-san still remembers the T7882 case.

Note that exactly same data and length computed by ssh_signature_encoder_rsa, including additional 0, reach:
https://github.com/openssh/openssh-portable/blob/V_10_2_P1/sshkey.c#L517-L537

where digest validation fails.

computed by ssh_signature_encoder_rsa, including additional 0, reach:

Before reaching there, it goes through:
https://github.com/openssh/openssh-portable/blob/V_10_2_P1/ssh-rsa.c#L546-L571

It looks like it does exactly the same handling of adding 0 to pad the signature.

(It seems for me that GitHub doesn't have support accessing old commits. So, let me allow referring another repo.)

In 2002, the handling of RSA padding in OpenSSH-portable was introduced:
https://anongit.mindrot.org/openssh.git/commit/?id=ceae9d1c333c24894cccddc861c1b9b6d208a8bc

gniibe triaged this task as Normal priority.Wed, Apr 1, 4:29 AM

@jpalus You are right.

My fix in T7882 was wrong. The first argument is DEST, and the second argument is SRC.

diff --git a/agent/command-ssh.c b/agent/command-ssh.c
index ff4ca058e..2809916e2 100644
--- a/agent/command-ssh.c
+++ b/agent/command-ssh.c
@@ -1481,7 +1481,7 @@ ssh_signature_encoder_rsa (ssh_key_type_spec_t *spec,
       err = gcry_mpi_print (GCRYMPI_FMT_USG, data, spec->keysize, &data_n, s);
       if (data_n < spec->keysize)
         {
-          memmove (data, data+spec->keysize-data_n, data_n);
+          memmove (data+spec->keysize-data_n, data, data_n);
           memset (data, 0, spec->keysize-data_n);
           data_n = spec->keysize;
         }

I'm going to push this change to master.

Great spotting! This was it. Quite embarrassing that I've looked at this code so many time yet it didn't cross my mind to double check arguments order.

gniibe mentioned this in Unknown Object (Maniphest Task).Mon, Apr 13, 6:35 AM