Hello,
after upgrading from Ubuntu 16.04 to Ubuntu 18.04 we've noticed the issues which came along with the gpg v2.x.
The gpg-agent produces millions of futex syscall errors during a very short time (a second or two) when it's being loaded either by the SaltStack's salt-master decrypting the pillars (our main use case) or when it is being directly tested with "parallel" tool from moreutils package.
$ sudo strace -f -p <pidof gpg-agent> ... ...Ctrl+C just after couple of seconds while "gpg -d" commands are running in parallel (see below for details) ... % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 96.35 2800.145231 305 9194103 2009552 futex 3.63 105.404136 373774 282 pselect6 0.01 0.431338 102 4246 read 0.00 0.104701 12 8490 write 0.00 0.029085 103 283 accept 0.00 0.016549 58 284 madvise 0.00 0.012201 22 567 close 0.00 0.010979 8 1410 getpid 0.00 0.010405 12 849 access 0.00 0.006341 22 284 284 wait4 0.00 0.004350 15 283 openat 0.00 0.003764 13 283 clone 0.00 0.002668 9 283 getsockopt 0.00 0.002568 9 283 fstat 0.00 0.002564 9 283 set_robust_list 0.00 0.001941 7 283 lseek ------ ----------- ----------- --------- --------- ---------------- 100.00 2906.188821 9212496 2009836 total
I'll describe the issue and steps to reproduce it.
First, prepare the "enc" file:
cat /usr/share/doc/base-files/README | gpg -ear "some-4K-RSA-publick-key" > enc
Run parallel decryptions using "time" to measure it:
time parallel -j 30 sh -c "cat enc | gpg --no-tty -d -q -o /dev/null" -- $(seq 1 3000)
Running "gpg -d" (GPG v2.x, with the gpg-agent) in parallel as described above took:
- 1minute 18seconds on a big HW; (48 cores, *gpg-agent 2.2.4*-1ubuntu1.2)
- 32 seconds on my laptop; (4 cores, *gpg-agent 2.2.19*-3ubuntu2)
Running the same commands but with GPG v1.4.20 (no agent):
- 9 seconds on a big HW: (40 cores, *gnupg 1.4.20*-1ubuntu3.3)
- 21 seconds on a VM; (1 core, *gnupg 1.4.20*-1ubuntu3.3)
Note: in order to prevent "command 'PKDECRYPT' failed: Cannot allocate memory <gcrypt>" error, the gpg-agent is running either with "--auto-expand-secmem 0x30000" flag or with "auto-expand-secmem" in ~/.gnupg/gpg-agent.conf file.
Since our use case is to have SaltStack's salt-master decrypt many pillars for hundreds of servers, the Ubuntu 16.04 => 18.04 upgrade severely degrades the SaltStack performance making it almost unusable, i.e. it becomes 10 times slower, requires us figuring workarounds such as increasing "gather_job_timeout" or probably even rolling back to gpg v1.x.
Any suggestions?
Kind regards,
Andrey Arapov