Page MenuHome GnuPG

gpgme-1.9.0 test suite hangs on OpenBSD
Closed, ResolvedPublic

Description

Hi,

I'm working on updating the OpenBSD gpgme package to 1.9.0. It builds fine, but the test suite hangs like this:

Making check in src
gmake[1]: Entering directory '/usr/ports/pobj/gpgme-1.9.0/gpgme-1.9.0/src'
gmake[1]: Nothing to be done for 'check'.
gmake[1]: Leaving directory '/usr/ports/pobj/gpgme-1.9.0/gpgme-1.9.0/src'
Making check in tests
gmake[1]: Entering directory '/usr/ports/pobj/gpgme-1.9.0/gpgme-1.9.0/tests'
Making check in gpg
gmake[2]: Entering directory '/usr/ports/pobj/gpgme-1.9.0/gpgme-1.9.0/tests/gpg'
gmake  check-TESTS check-local
gmake[3]: Entering directory '/usr/ports/pobj/gpgme-1.9.0/gpgme-1.9.0/tests/gpg'
starting gpg-agent..
gpg-connect-agent: no running gpg-agent - starting '/usr/local/bin/gpg-agent|--debug-quick-random'
gpg-connect-agent: waiting for the agent to come up ... (5s)
gpg-connect-agent: connection to agent established
OK
PASS: initial.test
-----BEGIN PGP MESSAGE-----

hQEOA2rm1+5GqHH4EAP9E9VJ72p/XKkN00gzhZbpt4CJmrwgRbxGuR6IzbCqJ2dY
jqo4rJnYXDQFl4Dcg0IgjPRgcYNjiVom7aOQC1pbKaIcUxjrm2Kolsio9EsOTnIo
6i/o3zoFAfWomqXxbskIPxtH2HSKRuxCuQf8Q6/BWuqo4i/tbAng+Y2dE86mj9cD
/imRn1F1glTMhK45+HF9xXHnnqn9QwaWQnurXQVR4P3t/LBiNUJPCuyEE9/cywdd
asKF8ARNmeb3g66A7Uy3rdW/BDlpcJ0R8kjAxTJxVo00emujYAmJamZL+cZySL/f
wkQGfFD/3ouHqpsOasFh+KBokHbFnW4ULkU0nhWtPNqOhQEOA1OB6k7im6N/EAP+
L1ZdzAkjdmu8tE7zW3zCsDZ5xKGazuMsDK/TJp4ZFXusDKg2ilM85+Kpf85AkKVY
k9nbBBwEgSSs3sDTbFOyd3ULt/yz5d4rNDks/QGTkD5NhORvU1ZB/MD+Y3Pkr64N
cwlq85j0/KmwZ0hK2EKhgOmZkSkA/6qEput976CJkbgD/0aTBSybhac+Fpi3VilE
98GeZQePEm9NyNV1t26/J8r72yOUUnldWxh3RhhNQkMeIXmpnAzGxJ03Hf7O352k
SZODJ97HnGSzrNcSo34qHXxXW2NuJnr9062pXWFlNORd8d8G++HRfOtVVE8HJ3Dd
LcXQrD+iT0xZ3n47Tf1se/mO0kcBpsUOGgUhUIoeMZYylJm5zjSsftMdx9j5yalH
L86rf4Ma8j/9+MyXnUMiC0jZkzfEGbJo9p4z+VA0fkFPvax5V7KRs/NOQg==
=U26Q
-----END PGP MESSAGE-----
PASS: t-encrypt

At this point it seems the tests hang forever.

This is with:

$ gpg2 --version
gpg (GnuPG) 2.2.1
libgcrypt 1.8.1
Copyright (C) 2017 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Home: /home/edd/.gnupg
Supported algorithms:
Pubkey: RSA, ELG, DSA, ECDH, ECDSA, EDDSA
Cipher: IDEA, 3DES, CAST5, BLOWFISH, AES, AES192, AES256, TWOFISH,
        CAMELLIA128, CAMELLIA192, CAMELLIA256
Hash: SHA1, RIPEMD160, SHA256, SHA384, SHA512, SHA224
Compression: Uncompressed, ZIP, ZLIB, BZIP2

On OpenBSD-current.

I also tried with GnuPG-2.1.23. It was a similar outcome, expect that gpg-agent would spin at 100% CPU consumption during the hang. Weird. I guess that was a separate bug which has been fixed in 2.2.1.

Due to all the IPC involved, I'm not really sure about how to start debugging this. Any tips?

Thanks

Details

Version
1.9.0

Event Timeline

vext01 created this object in space S1 Public.

I've been experimenting.

The GnuPG-2.2.1 test suite also hangs (on t-protect) on this same machine which happens to be a virtual machine I use for ports.

Here's the twist, if I try outside the VM on the host, the GnuPG test suite works fine.

This doesn't make a great deal of sense to me. Any ideas why this could be?

Here are a couple of traces of the hanging t-protect test under the VM. I just let it run for a bit under gdb and pressed ctrl+c on a couple of occasions:

Starting program: /usr/ports/pobj/gnupg-2.2.1/gnupg-2.2.1/agent/t-protect 
[New process 63697]
^C
Program received signal SIGINT, Interrupt.
0x00000f61e77bc610 in _gcry_md_block_write () from /usr/local/lib/libgcrypt.so.19.5
Current language:  auto; currently minimal
(gdb) bt
#0  0x00000f61e77bc610 in _gcry_md_block_write () from /usr/local/lib/libgcrypt.so.19.5
#1  0x00000f61e77b39c4 in md_write () from /usr/local/lib/libgcrypt.so.19.5
#2  0x00000f61e77b892f in _gcry_kdf_derive () from /usr/local/lib/libgcrypt.so.19.5
#3  0x00000f61e779390e in gcry_kdf_derive () from /usr/local/lib/libgcrypt.so.19.5
#4  0x00000f5f54b0113d in get_standard_s2k_count () at protect.c:1418
#5  0x00000f5f54b01ad2 in agent_protect (plainkey=0xf5f54c13990 "(11:private-key(3:rsa(1:n129:", 
    passphrase=0xf5f54c1407c "passphrase", result=0x7f7ffffbb7f0, resultlen=0x7f7ffffbb7f8, s2k_count=0, use_ocb=Variable "use_ocb" is not available.
)
    at protect.c:419
#6  0x00000f5f54b00f82 in main (argc=Variable "argc" is not available.
) at t-protect.c:196
(gdb) c
Continuing.
^C
Program received signal SIGINT, Interrupt.
0x00000f61e7794f1f in __gcry_burn_stack () from /usr/local/lib/libgcrypt.so.19.5
(gdb) bt
#0  0x00000f61e7794f1f in __gcry_burn_stack () from /usr/local/lib/libgcrypt.so.19.5
#1  0x00000f61e77bc715 in _gcry_md_block_write () from /usr/local/lib/libgcrypt.so.19.5
#2  0x00000f61e77b39c4 in md_write () from /usr/local/lib/libgcrypt.so.19.5
#3  0x00000f61e77b892f in _gcry_kdf_derive () from /usr/local/lib/libgcrypt.so.19.5
#4  0x00000f61e779390e in gcry_kdf_derive () from /usr/local/lib/libgcrypt.so.19.5
#5  0x00000f5f54b0113d in get_standard_s2k_count () at protect.c:1418
#6  0x00000f5f54b01ad2 in agent_protect (plainkey=0xf5f54c13990 "(11:private-key(3:rsa(1:n129:", 
    passphrase=0xf5f54c1407c "passphrase", result=0x7f7ffffbb7f0, resultlen=0x7f7ffffbb7f8, s2k_count=0, use_ocb=Variable "use_ocb" is not available.
)
    at protect.c:419
#7  0x00000f5f54b00f82 in main (argc=Variable "argc" is not available.
) at t-protect.c:196

And here's the end of a ktrace dump. It looks like the process is spinning waiting for something to happen:

...
 97111 t-protect RET   read 508/0x1fc
 97111 t-protect CALL  kbind(0x7f7ffffbde40,24,0x93b1ccc8886c1fd5)
 97111 t-protect RET   kbind 0
 97111 t-protect CALL  gettimeofday(0x7f7ffffbdf18,0)
 97111 t-protect STRU  struct timeval { 1509216695<"Oct 28 19:51:35 2017">.216495 }
 97111 t-protect RET   gettimeofday 0
 97111 t-protect CALL  kbind(0x7f7ffffbde40,24,0x93b1ccc8886c1fd5)
 97111 t-protect RET   kbind 0
 97111 t-protect CALL  getrusage(RUSAGE_SELF,0x7f7ffffbdf18)
 97111 t-protect STRU  struct rusage { utime=0, stime=0, maxrss=1828, ixrss=0, idrss=0, isrss=0, minflt=297, majflt=14, nswap=0, inblock=5, oublock=13, msgsnd=0, msgrcv=0, nsignals=0, nvcsw=25, nivcsw=1 }
 97111 t-protect RET   getrusage 0
 97111 t-protect CALL  gettimeofday(0x7f7ffffbdee0,0)
 97111 t-protect STRU  struct timeval { 1509216695<"Oct 28 19:51:35 2017">.216648 }
 97111 t-protect RET   gettimeofday 0
 97111 t-protect CALL  kbind(0x7f7ffffbde40,24,0x93b1ccc8886c1fd5)
 97111 t-protect RET   kbind 0
 97111 t-protect CALL  getrusage(RUSAGE_SELF,0x7f7ffffbde68)
 97111 t-protect STRU  struct rusage { utime=0, stime=0, maxrss=1828, ixrss=0, idrss=0, isrss=0, minflt=298, majflt=14, nswap=0, inblock=5, oublock=13, msgsnd=0, msgrcv=0, nsignals=0, nvcsw=25, nivcsw=1 }
 97111 t-protect RET   getrusage 0
 97111 t-protect CALL  getpid()
 97111 t-protect RET   getpid 97111/0x17b57
 97111 t-protect CALL  kbind(0x7f7ffffbe260,24,0x93b1ccc8886c1fd5)
 97111 t-protect RET   kbind 0
 97111 t-protect CALL  kbind(0x7f7ffffbe230,24,0x93b1ccc8886c1fd5)
 97111 t-protect RET   kbind 0
 97111 t-protect CALL  getrusage(RUSAGE_SELF,0x7f7ffffbe248)
 97111 t-protect STRU  struct rusage { utime=0, stime=0, maxrss=1836, ixrss=0, idrss=0, isrss=0, minflt=307, majflt=14, nswap=0, inblock=5, oublock=13, msgsnd=0, msgrcv=0, nsignals=0, nvcsw=25, nivcsw=1 }
 97111 t-protect RET   getrusage 0
 97111 t-protect CALL  getrusage(RUSAGE_CHILDREN,0x7f7ffffbe248)
 97111 t-protect STRU  struct rusage { utime=0, stime=0, maxrss=0, ixrss=0, idrss=0, isrss=0, minflt=0, majflt=0, nswap=0, inblock=0, oublock=0, msgsnd=0, msgrcv=0, nsignals=0, nvcsw=0, nivcsw=0 }
 97111 t-protect RET   getrusage 0
 97111 t-protect CALL  gettimeofday(0x7f7ffffbe238,0)
 97111 t-protect STRU  struct timeval { 1509216695<"Oct 28 19:51:35 2017">.217056 }
 97111 t-protect RET   gettimeofday 0
 97111 t-protect CALL  kbind(0x7f7ffffbe210,24,0x93b1ccc8886c1fd5)
 97111 t-protect RET   kbind 0
...

I'm starting to think this could be a bug in OpenBSD vmm(4) time handling...

I confirm that applying the patch fixes the hang under a VM, and does not adversely affect running on a bare metal machine either.

gniibe changed the task status from Open to Testing.Nov 7 2017, 9:23 AM
gniibe claimed this task.

This is a bug in gpg-agent.

So is 380bce13d94f the correct fix? If so, I will update the OpenBSD port including this as a local patch.

Yes, it will be in 2.2.3. It's too late for 2.2.2.