[solved] gpg-agent respawn another process randomly and causes cached passphrase check failed / expired
Open, Needs TriagePublic

Description

Hello guys, I wrote a bash script to validate a cached passphrase in gpg-agent and I have another script that will continue to use this cached passphrase to do its job. The script has no problem and works perfectly. I published this script in gitlab for demonstration:

https://gitlab.com/sofibox/maxicron/-/blob/master/usr/local/maxicron/gpg/gpgcron

But, I have 1 problem sometimes around 10-20 minutes [2 times perday] I notice that a new gpg-agent process running and it's gone randomly. Between this time, I also notice that, I was unable to read my cached passphrase in gpg-agent until the new gpg-agent process is terminated or killed.

So, I added some debug scripts to the code above to see the details of the randomly spawn gpg-agent process, and I found out the following log:

I run this script each minute in cronjob to debug the problem:

* * * * * root /usr/local/maxicron/gpg/gpgcron KEY_ID --warning-only > /dev/null

When a new gpg-agent process is created automatically I got this log and send to my email automatically every 1 minute:

[gpgcron]: OK, gpg-agent daemon is running with the following process id(s):
--- (1) PID: 1734578 ---
Process id details:
    PID MEMORY USER     GROUP    COMMAND         ARGS
1734578 168996 root     root     gpg-agent       gpg-agent --options /root/.gnupg/gpg-agent.conf --daemon
Process tree details:
systemd,1 --switched-root --system --deserialize 18
  `-gpg-agent,1734578 --options /root/.gnupg/gpg-agent.conf --daemon
---------
--- (2) PID: 1874245 ---
Process id details:
    PID MEMORY USER     GROUP    COMMAND         ARGS
1874245 168996 root     root     gpg-agent       gpg-agent --homedir /root/.gnupg --use-standard-socket --daemon
Process tree details:
systemd,1 --switched-root --system --deserialize 18
  `-gpg-agent,1874245 --homedir /root/.gnupg --use-standard-socket --daemon
---------
[gpgcron]: Warning, gpg-agent current process id [1874245 1734578] is different from the cached process id [1734578]
[gpgcron]: Warning, there are 2 gpg-agent processes that are currently running: [1874245 1734578]
[gpgcron]: Warning, no passphrase is cached in gpg-agent for key id of [KEY_ID]
[gpgcron]: [gpgcron] is terminated!

You can see that, the PID 1874245 is a new gpg-agent process and it runs using this argument: --homedir /root/.gnupg --use-standard-socket --daemon

This process causes my script to failed to validate the passphrase cached in gpg-agent as shown in the output, but at some time, the new process will be terminated automatically and the passphrase can be validated again. So, with this weird behavior sometimes it causes my other script to return error because it was not able to use cached password from gpg-agent.

The output will become like this if the new gpg-agent process is not running (this will only output in terminal because it's not an error):

[root@earth gpg]# ./gpgcron KEY_ID
[gpgcron]: OK, gpg-agent daemon is running with the following process id(s):
--- (1) PID: 1734578 ---
Process id details:
    PID MEMORY USER     GROUP    COMMAND         ARGS
1734578 168996 root     root     gpg-agent       gpg-agent --options /root/.gnupg/gpg-agent.conf --daemon
Process tree details:
systemd,1 --switched-root --system --deserialize 18
  `-gpg-agent,1734578 --options /root/.gnupg/gpg-agent.conf --daemon
---------
[gpgcron]: OK, gpg-agent current process id [1734578] is matched from the cached process id [1734578]
[gpgcron]: OK, passphrase is already cached in gpg-agent [1734578] for key id of [KEY_ID]
[gpgcron]: Validating cached passphrase from gpg-agent ...
[gpgcron]: OK, passphrase is valid since [Fri Sep 18 12:11:44 +08 2020], expired on [Sat Oct 23 12:11:44 +08 2021] in [399d 9h 54m 54s]
[root@earth gpg]#

Any idea how to solved this ? Is this a bug ?

Thank you

MaXi32 created this task.Sep 18 2020, 8:09 PM
MaXi32 updated the task description. (Show Details)
MaXi32 updated the task description. (Show Details)
MaXi32 updated the task description. (Show Details)Sep 18 2020, 8:11 PM
MaXi32 updated the task description. (Show Details)
MaXi32 updated the task description. (Show Details)Sep 18 2020, 8:17 PM
MaXi32 updated the task description. (Show Details)Sep 18 2020, 8:24 PM
MaXi32 updated the task description. (Show Details)
MaXi32 updated the task description. (Show Details)Sep 18 2020, 8:31 PM
MaXi32 awarded a token.
MaXi32 added a comment.EditedSep 19 2020, 6:16 AM

Ok let me update what I did next:

Last night I changed the script to start gpg-agent differently from using the custom config like this:

gpg-agent --options $GPG_CONFIG_FILE --daemon

I changed it to:

gpg-agent --homedir "${GNUPGHOME:-$HOME/.gnupg}" --daemon

and, I only got 2 emails from the script log at 7.00am and 10.00am like this:

[gpgcron]: OK, gpg-agent daemon is running with the following process id(s):
--- (1) PID: 2065787 ---
Process id details:
    PID MEMORY USER     GROUP    COMMAND         ARGS
2065787 168996 root     root     gpg-agent       gpg-agent --homedir /root/.gnupg --daemon
Process tree details:
systemd,1 --switched-root --system --deserialize 18
  `-gpg-agent,2065787 --homedir /root/.gnupg --daemon
---------
[gpgcron]: OK, gpg-agent current process id [2065787] is matched from the cached process id [2065787]
[gpgcron]: Warning, no passphrase is cached in gpg-agent for key id of [2B705B8B6FA943B1]
[gpgcron]: [gpgcron] is terminated!

This means, gpg was trying to respawn a new random process but couldn't make it maybe I changed the running argument to different argument like above using --homedir, and gpg cannot use the same argument to respawn new process using --homedir option? I hope somebody can answer this nightmare why gpg is trying to respawn a new process ? If I use the --option method to run gpg, it will respawn a new process with --homedir argument, then I got many email each minute until the new process terminated.

The problem here is that, even it couldn't respawn a new process, it still make the cache failed to validate but less than 1 minute time..

Just to let you know that , using --homedir option also has the same problem I noticed today. I got email each minute like this:

[gpgcron]: OK, gpg-agent daemon is running with the following process id(s):
--- (1) PID: 2739754 ---
Process id details:
    PID MEMORY USER     GROUP    COMMAND         ARGS
2739754 168996 root     root     gpg-agent       gpg-agent --homedir /root/.gnupg --use-standard-socket --daemon
Process tree details:
systemd,1 --switched-root --system --deserialize 18
  `-gpg-agent,2739754 --homedir /root/.gnupg --use-standard-socket --daemon
---------
--- (2) PID: 2787403 ---
Process id details:
    PID MEMORY USER     GROUP    COMMAND         ARGS
2787403  95264 root     root     gpg-agent       gpg-agent --homedir /root/.gnupg --use-standard-socket --daemon
Process tree details:
systemd,1 --switched-root --system --deserialize 18
  `-gpg-agent,2787403 --homedir /root/.gnupg --use-standard-socket --daemon
---------
[gpgcron]: Warning, gpg-agent current process id [2787403 2739754] is different from the cached process id [2739754]
[gpgcron]: Warning, there are 2 gpg-agent processes that are currently running: [2787403 2739754]
[gpgcron]: Warning, no passphrase is cached in gpg-agent for key id of [2B705B8B6FA943B1]
[gpgcron]: [gpgcron] is terminated!

I can create a script to manually kill the 2nd process, but can u guys confirm with me that this is a known bug ?

I'm now able to kill the respawn process in the script (updated the script). But I need confirmation if this is a known bug ?

werner added a subscriber: werner.Sep 20 2020, 7:39 PM

FWIW: You may get a faster answer if you post to gnupg-users mailing lists. Bug reports are a tool to fix bugs and usually are only seen by a few developers.

In particular for systemd based installations (--supervised mode) I can't given an answer without research because I do not use systemd. The supervised mode was added by us after a long discussion but under the premise that the Linux distros will in general take care of problems.

MaXi32 added a comment.EditedSep 21 2020, 1:11 AM

Thanks for your reply. I can confirm from my observation from the log this is a bug where I'm able to reproduce this every day. I will post this to mailing lists.

Also I would like to leave this question, and hope if someone could answer this:

Why gpg-agent is able to run with more than 1 processes (as shown in the output above). The gpg-agent itself when running it without argument, it's already has a function to block itself from running more than 1 instances. Even if I run this with an argument, it won't create a new process:

[root@earth gpg]# gpg-agent --options /root/.gnupg/gpg-agent.conf --daemon
gpg-agent: a gpg-agent is already running - not starting a new one

But how is it possible this process can start and bypass the gpg-agent validation. Weird right.

This comment was removed by MaXi32.
MaXi32 added a comment.EditedSep 21 2020, 1:42 AM

Just to acknowledge here: I notice that the new gpg-agent random process respawn with an obsolete argument using --use-standard-socket. I will run my gpg daemon using this absolete argument to see if it can block this random process. [updated the script]

MaXi32 added a comment.EditedSep 21 2020, 3:22 PM

Update: Using --use-standard-socket argument to run this does not work and gpg-agent still create new process. New findings:

I tried killing the process using kill -9 $PID in the script, where $PID is the current random process, it did kill the process but after that, it immediately respawn a new gpg-agent process with different PID. So, I always got 2 gpg-agent processes running ( so this has happened almost 2 hours-I received email every 60 seconds within this 2 hours) untill i manually terminate the process with kill -15 $PID , then the process was gone. If I don't kill the process and let it terminate randomly which took only about 10-20 minutes

So, I don't not know if the new random gpg-agent process was gone because I use signal kill -15 to kill or it was gone randomly few seconds before I tried killing that manually. So, I updated my script to use kill -15 and wait until the new process respawn randomly and see the result.

Hope that you don't mind I keep updating this thing and I might use this thread to ask for help in the mailing list instead of creating new thread.

MaXi32 added a comment.EditedTue, Sep 29, 12:43 PM

Update found out the answer for this mystery but I need to know why from you guys:

The new gpg-agent that was respawn randomly is caused by all the running cronjobs. When a cronjob runs, then the new gpg-agent process respawn until the script finish running (it terminates, then the cache is valid again). So, this is the example of the script that cause the gpg-agent process respawn randomly (this is a chkrootkit script):

#!/bin/bash

PATH=/bin:/usr/bin:/sbin:/usr/sbin
export PATH
#By Arafat Ali (MaXi32) - Chkrootkit check for warning or infected
#| TODO 1) Check update - No builtin update
#| TODO 2) Check level permission to run this script
#|

echo "======================================"
echo "[chkrootkit | info]: Chkrootkit is checking system..."

#Global variables
MYHOSTNAME=`/bin/hostname`
MYEMAIL="webmaster@sofibox.com"
CHKROOTKIT_LOG_PATH="/usr/local/maxicron/chkrootkit/log"
REPORT_FILE="/usr/local/maxicron/chkrootkit/log/chkrootkit-file.log"
REPORT_FILE_GREP="/usr/local/maxicron/chkrootkit/log/chkrootkit-file-grep.log"
#ROOTKIT_BIN="/usr/local/chkrootkit/"
MAIL_BIN="/usr/local/bin/mail"
WARNING_STATUS="N/A"

mkdir -p $CHKROOTKIT_LOG_PATH

sudo touch $REPORT_FILE
sudo chown root:adm $REPORT_FILE
sudo chmod 640 $REPORT_FILE
cat /dev/null > $REPORT_FILE

sudo touch $REPORT_FILE_GREP
sudo chown root:adm $REPORT_FILE_GREP
sudo chmod 640 $REPORT_FILE_GREP
cat /dev/null > $REPORT_FILE_GREP

echo "[chkrootkit | info]: Please wait..."
echo "Chkrootkit checked on `date`" >> $REPORT_FILE
sudo /usr/local/chkrootkit/chkrootkit >> $REPORT_FILE
echo "Rootkit scan return: $?"
#exit 1
echo "" >> $REPORT_FILE
echo "==================SCAN COMPLETED=================" >> $REPORT_FILE
echo "" >> $REPORT_FILE
echo "================= WARNING NOTICE ================" >> $REPORT_FILE

if (grep -e "Warning" -e "INFECTED" -e "were found" $REPORT_FILE >> $REPORT_FILE_GREP) then
        WARNING_STATUS="WARNING"
        cat $REPORT_FILE_GREP >> $REPORT_FILE
        echo "[chkrootkit | info]: You may Check chkrootkit update manually" >> $REPORT_FILE
        echo "================== END OF NOTICE ================" >> $REPORT_FILE
        $MAIL_BIN -s "[chkrootkit | $WARNING_STATUS] Check Rootkit Scan Report  @ $MYHOSTNAME" $MYEMAIL < $REPORT_FILE
else
        WARNING_STATUS="OK"
        echo "" >> $REPORT_FILE
        echo "NO WARNING FOUND" >> $REPORT_FILE
        echo "" >> $REPORT_FILE
fi
# $MAIL_BIN -s "[chkrootkit | $WARNING_STATUS] Check Rootkit Scan Report  @ $MYHOSTNAME" $MYEMAIL < $REPORT_FILE
echo "[chkrootkit | info]: Scan Status: $WARNING_STATUS"
echo "[chkrootkit | info]: Done checking system. Email notification is set to $MYEMAIL"

I don't see anything wrong with this script. I can confirm 100% that all my cronjob scripts when running it will cause gpg-agent to respawn new process.

Example for the backup script that runs about 15 minutes, the new random gpg-agent process will run for 15 minutes that also cause invalid passphrase in gpg-agent.

Can anyone here tell me what is the problem here? I don't get why ?

There is a conflict between the systemd based launching of gpg-agent and gpg's own launching of the agent. Further systemd seems to remove the /run/user/UID directory which unlinks gpg-agent's socket. gpg-agent's self-check notices this after a minute and termintates itself.

I have never understand the reason why Debian demanded the --supervised option. It is only helpful for ssh which has no capability (yet) to start gpg-agent on demand,

MaXi32 added a comment.Thu, Oct 1, 9:32 PM

Actually I don't use systemd unit to start gpg-agent (creating that service file in systemd and make it start automatically) but I run it manually within the script. Also the script has selfcheck not to run multiple gpg-agent like below:

PIDOF_COUNT=$(pidof -x gpg-agent | wc -w) # Get the number of gpg-agent process that is currently running
GPG_CONFIG_FILE="/usr/local/maxicron/gpg/conf/gpg-agent.conf" # Custom config path
if [ "$PIDOF_COUNT" -gt 0 ]; then
    # Tell and do nothing
    echo "gpg-agent daemon is running"
else
    # Tell and run it
    echo "gpg-agent is not running"
    gpg-agent --options $GPG_CONFIG_FILE --daemon
fi

I don't understand what kind of conflict after gpg-agent is running on its own (with only 1 process from the script), then when another cronjob script that has nothing related to gpg is executed from crontab, then it causes a new gpg-agent to create a new process.

MaXi32 added a comment.EditedSun, Oct 4, 4:19 PM

OK I found a very interesting thing here, @werner and finally I can reproduce this problem:

When I run any script with a sudo command from cronjob, out of sudden a new gpg-agent process is created:

# Put this in cronjob:
# 14 23 * * *     root    sudo /root/testscript.sh > /dev/null
!#/bin/bash
#testscript.sh
sudo /usr/local/maldetect/maldet -r / # This is a maldet scanner. It takes longer and here gpg-agent running until the scan is completed. I notice that only this line of code causes the gpg-agent to create new process.

but if the script is running without sudo I don't have that process:

# Put this in cronjob:
# 14 23 * * *     root    sudo /root/testscript.sh > /dev/null
!#/bin/bash
#testscript.sh
/usr/local/maldetect/maldet -r / # This is OK

Also for simplicity to reproduce this problem you can also use sleep function:

!#/bin/bash
sudo sleep 60 # This will cause gpg passphrase invalid for 60 seconds and a 2nd gpg-agent process running for 60 seconds

NOT only that if I run the whole script as sudo in cronjob like this also a problem:

14 23 * * * root sudo /root/testscript.sh > /dev/null

NOTE that running the sudo command directly in the terminal won't create the new process. Only on cronjob.

Not sure why the gpg-agent process respawn. It's definitely a mystery-bug but that's ok I can finally prevent this error by not using sudo command in every of my script.

However, this is not a solution, just a workaround because some of other scripts created by server packages need to run as root and they still cause this issue but I only got invalid cache passphrase at the exact time everyday at 7.00 am and 9.amm with less than 1 minute time. I can live with it for now.

MaXi32 renamed this task from gpg-agent respawn another process randomly and causes cached passphrase check failed / expired to [solved] gpg-agent respawn another process randomly and causes cached passphrase check failed / expired.Mon, Oct 12, 7:46 AM