Option to auto-increase secmem in gpg-agent
Open, HighPublic

Description

Test whether an option to gpg-agent to auto-increase the secmem size on-the-fly helps with hundreds of concurrent connections. We already use this for xmalloc style allocations but having this for all allocations of secmem might be useful.

werner created this task.Nov 22 2017, 8:39 PM

Here is the test case that I wrote a while back (Follow-up to Crashes with gpg-agent 2.1.18). It is written with bash in mind and creates a stand-alone GNUPGHOME directory with a pinentry routine that supplies the password (I guess I could have preset the passphrase) and then starts 200 concurrent gpg decryption requests. With GPG 2.1.18 and up, this usually exposes the out of memory situation very fast.

echo "# Setup agent config to avoid interactive password prompting" 
echo "verbose" >> $GNUPGHOME/gpg-agent.conf 
echo "log-file $GNUPGHOME/agent.log" >> $GNUPGHOME/gpg-agent.conf 
echo "pinentry-program $GNUPGHOME/pinentry.csh" >> $GNUPGHOME/gpg-agent.conf 

cat >$GNUPGHOME/pinentry.csh << EOF 
#!/bin/tcsh 
set word = "" 
while (("\$word" != "getpin") && ("\$word" != "GETPIN")) echo "OK" 
set word = "\$<" 
end 
echo "D abc123" 
echo "OK" 
EOF 
chmod a+rx $GNUPGHOME/pinentry.csh 

yes | gpg2 --passphrase abc123 --quick-gen-key user@host.com 

echo # Encrypt a file 
rm -f test.gpg 
echo testme > test 
gpg2 -e -r user@host.com test 

echo "# This had better work" 
gpg2 -d test.gpg || exit 9 

echo "# Hit the agent hard" 
for i in $(seq 1 200); do (gpg2 -d test.gpg > test${i}.log 2>&1 & ); done
wait
gpg-connect-agent 'getinfo pid' /bye
sleep 1
gpg-connect-agent killagent /bye
grep 'fail' *.log grep -C1 --color -E "jee|memory" $GNUPGHOME/agent.log 

echo "Doit!" 
echo "rm -rf $GNUPGHOME" 
gpg-connect-agent killagent /bye

The attached patches make the necessary changes to libgcrypt and gpg-agent. A word about my change to libgcrypt. Since all of the *_secure allocation operations were hardcoded to set xhint to zero, I simply replaced that hardcoded value with a static variable. In the patches I have some sample documentation for both changes. My scheme skills are quite old, so I did not write a test case.

Thanks for your patches. I decided to do this similar but I need to take several branches in account.

My test procedure is a bit different. I start the gpg-agent with a dedicated home directory and create a short ciphertext using a rsa4096 key and run

seq 1000 | sed 's/.*/ciphertext.gpg/' |\
  xargs -n1 -P50  ../g10/gpg -q -d --no-permission-warn  --batch --yes -o /dev/null

This runs 1000 encryptions with up to 50 process concurrently. And indeed I see quite some decryption failres and even a few errors due to memory problems. As soon as I add the new option --auto-expand-secmem
to the gpg-agent and run the same test again, it works and the --debug memstat show me at the end:

4 - 21:09:16 gpg-agent[18603]: secmem usage: 1344/32768 bytes in 2 blocks
4 - 21:09:16 gpg-agent[18603]:               0/32768 bytes in 0 blocks
4 - 21:09:16 gpg-agent[18603]:               0/32768 bytes in 0 blocks
4 - 21:09:16 gpg-agent[18603]:               0/32768 bytes in 0 blocks
4 - 21:09:16 gpg-agent[18603]:               0/32768 bytes in 0 blocks
4 - 21:09:16 gpg-agent[18603]:               0/32768 bytes in 0 blocks
4 - 21:09:16 gpg-agent[18603]:               0/32768 bytes in 0 blocks

I conclude that this features works. I have prepared patches for Libgcrypt and gnupg 2.2 and will push them tomorrow. Not yet tested but by using

--auto-expand-secmem 0x30000

only one new pool would be created and thus all calls to free () would be a bit faster.

THANK YOU! Once you push those changes, I'll see about back-porting the patches to Debian stable/Ubuntu LTS.

Once I deployed my changes, my test that starts up 2k processes ran into another problem. They exceeded the max open file descriptors. Raising the limit (default 1024) to 4096 solved the problem. If the agent cached the keys it decrypted, that would avoid repeatedly opening and decrypting the private key (reduces computation and memory usage). As you said in the email thread, this would require reference counting which libgrcypt does not do. This is a thought and not a call for action.

Somehow I expected such a report (too many open fds). We will need to replace our select based code by poll. However, I think this is more related to T3529.

mbudsjo added a subscriber: mbudsjo.Sep 7 2018, 4:59 PM