gpg-agent crashes regularly, out of core in secure memory allocations
Closed, ResolvedPublic

Description

Since a few days I have trouble with my gpg-agent, before that everything works just fine. Now it crashes
regularly with no coherence I can see.
I found nothing concerning this problem on the net or in the bugtracker of this site, nor could any of my friends
help me with this problem.

The OS is Arch Linux.
The length of the gpg-keys is 4096 Bit (saw someone during the search who got trouble with somhow enforced
huge keys. This is not the origin of my problem.)

The --log-file option returns:
2016-11-28 21:02:59 gpg-agent[5255] gpg-agent (GnuPG) 2.1.16 started
2016-11-28 21:03:43 gpg-agent[5255] Fatal: out of core in secure memory while allocating 512 bytes
2016-11-28 21:03:43 gpg-agent[5255] Fatal: libgcrypt problem: out of core in secure memory
2016-11-28 22:18:28 gpg-agent[1113] gpg-agent (GnuPG) 2.1.16 started
2016-11-28 22:18:42 gpg-agent[1113] Fatal: out of core in secure memory while allocating 304 bytes
2016-11-29 10:32:27 gpg-agent[1187] gpg-agent (GnuPG) 2.1.16 started
2016-11-29 10:40:53 gpg-agent[1187] Fatal: out of core in secure memory while allocating 1024 bytes

Details

Version
2.1.16
Pazuzu set Version to 2.1.16.
Pazuzu added a subscriber: Pazuzu.
justus renamed this task from gpg-agent crashes regularly to gpg-agent crashes regularly, out of core in secure memory allocations.Nov 29 2016, 1:35 PM
werner added a subscriber: werner.Nov 29 2016, 2:26 PM

gpg-agent sets 32k aside for so called secure memory. It seems Libgcrypt runs
out of memory during computations with private key parameters.

Please put "debug memstat" into gpg-agent.conf which should print two lines of
info at process termination. If possible do the same with the old version and
compare.

Another thing you can do is to start gpg-agent ("gpgconf --launch gpg-agent"),
then look for its PID and attach gdb:

  $ gpg gpg-agent PID
  gdb> break log_fatal
  gdb> c

after you hit the breakpoint enter "bt".

werner claimed this task.Nov 29 2016, 2:26 PM

Thanks for your fast reply.
Sadly I have not much time these days... but I have done what you suggested.
Honestly the log files dont tell me much. One thing I recognised is sometimes the logfiles end with "Fatal: libgcrypt
problem: out of core in secure memory" and sometimes they dont (I have not copied every log file here, this might
be to much).

This is what I got with debug memstat and the recent version of gnupg in Arch.

2016-11-30 21:18:35 gpg-agent[5516] listening on socket '/run/user/1000/gnupg/S.gpg-agent'
2016-11-30 21:18:35 gpg-agent[5517] gpg-agent (GnuPG) 2.1.15 started
2016-11-30 21:18:45 gpg-agent[5517] handler 0x7efe1d0e5700 for fd 5 started
2016-11-30 21:18:45 gpg-agent[5517] handler 0x7efe1c8e4700 for fd 6 started
2016-11-30 21:18:45 gpg-agent[5517] starting a new PIN Entry
2016-11-30 21:18:45 gpg-agent[5517] handler 0x7efe17fff700 for fd 8 started
2016-11-30 21:18:45 gpg-agent[5517] handler 0x7efe177fe700 for fd 9 started
2016-11-30 21:18:51 gpg-agent[5517] handler 0x7efe1d0e5700 for fd 5 terminated
2016-11-30 21:18:52 gpg-agent[5517] Fatal: out of core in secure memory while allocating 512 bytes

2016-11-30 21:18:52 gpg-agent[5517] Fatal: libgcrypt problem: out of core in secure memory
2016-11-30 21:18:35 gpg-agent[5516] listening on socket '/run/user/1000/gnupg/S.gpg-agent'
2016-11-30 21:18:35 gpg-agent[5517] gpg-agent (GnuPG) 2.1.15 started
2016-11-30 21:18:45 gpg-agent[5517] handler 0x7efe1d0e5700 for fd 5 started
2016-11-30 21:18:45 gpg-agent[5517] handler 0x7efe1c8e4700 for fd 6 started
2016-11-30 21:18:45 gpg-agent[5517] starting a new PIN Entry
2016-11-30 21:18:45 gpg-agent[5517] handler 0x7efe17fff700 for fd 8 started
2016-11-30 21:18:45 gpg-agent[5517] handler 0x7efe177fe700 for fd 9 started
2016-11-30 21:18:51 gpg-agent[5517] handler 0x7efe1d0e5700 for fd 5 terminated
2016-11-30 21:18:52 gpg-agent[5517] Fatal: out of core in secure memory while allocating 512 bytes

2016-11-30 21:18:52 gpg-agent[5517] Fatal: libgcrypt problem: out of core in secure memory
v2016-11-30 21:18:35 gpg-agent[5516] listening on socket '/run/user/1000/gnupg/S.gpg-agent'
2016-11-30 21:18:35 gpg-agent[5517] gpg-agent (GnuPG) 2.1.15 started
2016-11-30 21:18:45 gpg-agent[5517] handler 0x7efe1d0e5700 for fd 5 started
2016-11-30 21:18:45 gpg-agent[5517] handler 0x7efe1c8e4700 for fd 6 started
2016-11-30 21:18:45 gpg-agent[5517] starting a new PIN Entry
2016-11-30 21:18:45 gpg-agent[5517] handler 0x7efe17fff700 for fd 8 started
2016-11-30 21:18:45 gpg-agent[5517] handler 0x7efe177fe700 for fd 9 started
2016-11-30 21:18:51 gpg-agent[5517] handler 0x7efe1d0e5700 for fd 5 terminated
2016-11-30 21:18:52 gpg-agent[5517] Fatal: out of core in secure memory while allocating 512 bytes

2016-11-30 21:18:52 gpg-agent[5517] Fatal: libgcrypt problem: out of core in secure memory

This is what I got with debuging memstat and downgraded gnupg to gnupg-2.1.15-2-x86_64.pkg.tar.xz

2016-11-30 21:28:30 gpg-agent[5953] listening on socket '/run/user/1000/gnupg/S.gpg-agent'
2016-11-30 21:28:30 gpg-agent[5954] gpg-agent (GnuPG) 2.1.15 started
2016-11-30 21:28:37 gpg-agent[5954] handler 0x7fd6fedca700 for fd 5 started
2016-11-30 21:28:37 gpg-agent[5954] starting a new PIN Entry
2016-11-30 21:28:42 gpg-agent[5954] handler 0x7fd6fedca700 for fd 5 terminated
2016-11-30 21:28:47 gpg-agent[5954] handler 0x7fd6fedca700 for fd 5 started
2016-11-30 21:28:47 gpg-agent[5954] handler 0x7fd6fe5c9700 for fd 7 started
2016-11-30 21:28:47 gpg-agent[5954] handler 0x7fd6fddc8700 for fd 8 started
2016-11-30 21:28:47 gpg-agent[5954] handler 0x7fd6fd5c7700 for fd 9 started
2016-11-30 21:28:47 gpg-agent[5954] handler 0x7fd6fedca700 for fd 5 terminated
2016-11-30 21:28:47 gpg-agent[5954] handler 0x7fd6fddc8700 for fd 8 terminated
2016-11-30 21:28:47 gpg-agent[5954] handler 0x7fd6fe5c9700 for fd 7 terminated
2016-11-30 21:28:47 gpg-agent[5954] handler 0x7fd6fd5c7700 for fd 9 terminated
2016-11-30 21:29:32 gpg-agent[5954] handler 0x7fd6fe5c9700 for fd 7 started
2016-11-30 21:29:32 gpg-agent[5954] handler 0x7fd6fe5c9700 for fd 7 terminated
2016-11-30 21:30:10 gpg-agent[5954] handler 0x7fd6fe5c9700 for fd 5 started
2016-11-30 21:30:10 gpg-agent[5954] handler 0x7fd6fd5c7700 for fd 7 started
2016-11-30 21:30:10 gpg-agent[5954] handler 0x7fd6fddc8700 for fd 8 started
2016-11-30 21:30:10 gpg-agent[5954] handler 0x7fd6fedca700 for fd 9 started
2016-11-30 21:30:10 gpg-agent[5954] Fatal: out of core in secure memory while allocating 512 bytes

2016-11-30 21:30:10 gpg-agent[5954] Fatal: libgcrypt problem: out of core in secure memory

And this is what I got from gdb (I am quite unfamiliar with gdb, so maybe i done something wrong)

Reading symbols from gpg-agent...(no debugging symbols found)...done.
Attaching to program: /usr/bin/gpg-agent, process 3492
Reading symbols from /usr/lib/libgcrypt.so.20...(no debugging symbols found)...done.
Reading symbols from /usr/lib/libgpg-error.so.0...(no debugging symbols found)...done.
Reading symbols from /usr/lib/libassuan.so.0...(no debugging symbols found)...done.
Reading symbols from /usr/lib/libnpth.so.0...(no debugging symbols found)...done.
Reading symbols from /usr/lib/libpthread.so.0...(no debugging symbols found)...done.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
Reading symbols from /usr/lib/libc.so.6...(no debugging symbols found)...done.
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.
0x00007f05452cd18c in pselect () from /usr/lib/libc.so.6
(gdb) break log_fatal
Function "log_fatal" not defined.
Make breakpoint pending on future shared library load? (y or [n])
(gdb) c
Continuing.
[New Thread 0x7f053ffff700 (LWP 4687)]
[New Thread 0x7f05449ea700 (LWP 4698)]
[New Thread 0x7f053f7fe700 (LWP 4699)]
[New Thread 0x7f05451eb700 (LWP 4700)]
[Thread 0x7f053ffff700 (LWP 4687) exited]
[Thread 0x7f053f7fe700 (LWP 4699) exited]
[Thread 0x7f05451eb700 (LWP 4700) exited]
[Thread 0x7f05449ea700 (LWP 4698) exited]
[New Thread 0x7f05449ea700 (LWP 4733)]
[New Thread 0x7f05451eb700 (LWP 4745)]
[New Thread 0x7f053f7fe700 (LWP 4746)]
[New Thread 0x7f053ffff700 (LWP 4747)]
[Thread 0x7f053f7fe700 (LWP 4746) exited]
[Thread 0x7f05449ea700 (LWP 4733) exited]
[Thread 0x7f05451eb700 (LWP 4745) exited]
[Thread 0x7f053ffff700 (LWP 4747) exited]
[New Thread 0x7f053ffff700 (LWP 4775)]
[New Thread 0x7f05451eb700 (LWP 4776)]
[Thread 0x7f053ffff700 (LWP 4775) exited]

Thread 11 "gpg-agent" received signal SIGPIPE, Broken pipe.
[Switching to Thread 0x7f05451eb700 (LWP 4776)]

0x00007f054559a16d in write () from /usr/lib/libpthread.so.0

werner added a comment.Dec 2 2016, 9:15 AM

Another user reported the same problem on IRC. It seems it is Arch Linux
specific but we don't known for sure. The latest test with re-building
Libgcrypt w/o any special options didn't changed anything.

I need top be able to replicate the problem before I can come up with a solution.

dkg added a subscriber: dkg.Dec 5 2016, 5:47 PM

fwiw, i'm seeing this too, over at https://bugs.debian.org/846953 , for a user
with an insanely large (10240-bit) RSA key when it is locked with a passphrase.

I'm attaching such an example secret key (with passphrase "abc123"), and you can
trigger the crash with:

gpg --batch --yes --import test-hugekey.key
echo test | gpg -r 861A97D02D4EE690A125DCC156CC9789743D4A89

--encrypt --armor --trust-model=always --batch --yes --output data.gpg

            gpg --decrypt data.gpg

While i think it's fair to say that we need to have some limits on the sizes of
keys we can handle, gpg-agent should not crash when asked to deal with
extra-large keys, it should fail gracefully and return a sensible error code.

dkg added a comment.Dec 5 2016, 5:47 PM

Yeah, I saw the Debian bug report. Unfortunately there is no easy
solution to this except for rejecting the use of large secret keys.

The problem here is that the big number library needs to allocate from
a limited secure memory region (32 KiB by default) and terminates on
allocation failure. I know that this is sub-optimal but we are doing
this for 19 years now. Checking for an error after each low-level big
number operation would make the code unreadable and will introduce
bugs. Ideas on what to do:

  • On secure memory allocation failure, call the out-of-memory handler which may then free other memory (or purchase new memory). This can be done in the application.
  • On secure memory allocation failure, allocate a new block of secure memory and allocate from that one. There are two disadvantages: a) It is unlikely that the new block can be mlock'd. b) A free will be a a little bit slower because it needs to check the list of secure memory blocks and not just one address range. The address range check is needed so that we can figure out whether the freed address is in the secmem range and needs to zeroed out. This requires a new Libgcrypt version, though no ABI change.
  • We have a ./configure option --enable-large-secmem which sets 64k instead of 32k aside for the secmem. This is currently only used in gpg to enable gpg's --enable-large-rsa option. Given that in 2.1 we use the gpg-agent for the secret key operations we should have the same options in gpg-agent. However, it is only a kludge, but one we once agreed upon to silence some pretty vocal experts on key size.
dkg added a comment.Dec 6 2016, 5:12 PM

I just posted 0001-agent-Respect-enable-large-secmem.patch to gnupg-devel:

https://lists.gnupg.org/pipermail/gnupg-devel/2016-December/032285.html

dkg added a comment.Dec 6 2016, 5:18 PM

is the only goal of the secure memory to keep the RAM from being written to
swap, or are there other goals of secure memory? why is it unlikely that a new
block of memory can be mlock'd? what are the consequences of the new block not
being mlock'd? will it still be treated as secure memory?

crashing in the event that we run out of secure memory is simply not acceptable
these days, especially in a model where we have persistent long-term daemons
that people expect to remain running.

werner added a comment.Dec 6 2016, 5:38 PM

The secmem has two goals:

  • Avoid swapping out tehse pages. Thus the mlock.
  • Making sure that on free the memory is zeroized.

mlock requires root privileges and thus a special init sequence is required
(install as setuid(root) and gpg-agent drops the privileges direct after
allocating and mlocking the secmem). In the old times, and probably still today
on non-Linux platforms, this is still required. However, Linux turned to
allowing any process to mlock a certain amount (64k on my box).

I tend to suggest that we extend Libgcrypt to extend the secure memory
allocation by not using mlocked memory but keeping the the seroization feature.
The second option from T2857 (wk on Dec 05 2016, 07:11 PM / Roundup).

dkg added a comment.Dec 6 2016, 5:49 PM

ah right, "ulimit -l" says 64 (kbytes) on my Linux system as well. According to
mlock(2) that's since kernel 2.6.9.

So i think it's worth adopting the supplied patch as a workaround at least (i
can confirm that it resolves the specific use case described in T2857 (dkg on Dec 05 2016, 05:47 PM / Roundup)), and i
agree with you that we should extend libgcrypt to extend secure memory allocation.

it's not clear to me that swap is outside the trust boundary anyway these days,
and modern systems should prefer encrypted swap where possible.

werner added a comment.Dec 6 2016, 5:50 PM

I will try out the idea of extending the secmem pool even if that means no mlock.

werner added a comment.Dec 7 2016, 5:07 PM

I have now pushed a change to Libgcrypt master to implement auto-extending of
secre memory pools. Commit b6870cf but there are two cother commits which this
is based upon. My test shows that I can now decrypt a message encrypted to the
test-hugekey.key.

I will port this back to Libgcrypt 1.7.

werner added a comment.Dec 7 2016, 5:39 PM

Backported to LIBGCRYPT-1-7-BRANCH

werner added a project: Testing.
werner closed this task as Resolved.Dec 9 2016, 4:04 PM

I just released Libgcrypt 1.7.4 - whcih should fix that bug.

The problem still occured after the update of Libgcrypt, but Im pretty sure now
that I determine the origin of the problem. In the end it is somehow my fault: By
time I got more and more email accounts which are synchronized with offlineimap and
the passwords for each account are encrypted with gpg.
Offlineimap offers an option for multitheading, which synchronize the accounts in a
prallel manner. By changing to a strict serialized synchronistaion the problem
seems to vanish. My guess is, it was simply to much at once.
For those, who encounter the same problem try the '-1' option of offlineimap.

Thanks for your time and work (in general)!

Pazuzu reopened this task as Open.Dec 17 2016, 4:00 PM
Pazuzu closed this task as Resolved.