Page MenuHome GnuPG

_gpg_close_all_fds hangs on nwer Linux systems in a simple chroot w/o /proc/self/fd
Testing, NormalPublic

Description

This was originally reported on https://bugs.debian.org/1079696

When gpgconf (or any gpg process that spawns a child process) is run in a bare chroot or other constrained environment where /proc/self/fd is not available, it tries to close every file descriptor up to the maximum limit. This can take over an hour, depending on the configuration of the system.

I can confirm that this happens using libgpgerror (libgpgrt) version 1.51-3 on debian.

The source for this loop appears to be in get_max_fds in src/spawn-posix.c, which looks for /proc/self/fd but failing that walks through (in preference order, taking the first available answer):

  • RLIMIT_NOFILE
  • RLIMIT_OFILE
  • SC_OPEN_MAX (from sysconf)
  • POSIX_OPEN_MAX
  • OPEN_MAX
  • defaults to 256

As can be seen from the strace output in https://bugs.debian.org/1079696 , on at least some modern Debian systems, the rlim_max for RLIMIT_NOFILE is 1073741816 (0x3ffffff8 in hex), even though rlim_cur could be just 1024.

Trying to do more than a billion close() calls will make any process appear to hang indefinitely. Even if each close call takes 0.000004s (as i've measured on a modern machine) we're still talking about more than an hour spent on this loop.

Details

Event Timeline

Hm, this might also be relevant in GnuPG's codebase in common/exechelp-posix.c, which contains a copy of the same code (licensed differently).

i note that get_max_fds ends with this:

  /* AIX returns INT32_MAX instead of a proper value.  We assume that
     this is always an error and use an arbitrary limit.  */
#ifdef INT32_MAX
  if (max_fds == INT32_MAX)
    max_fds = 256;
#endif

This appears to have been introduced due to T1778, as reported by @aixtools , but was given as a narrowly targeted workaround, rather than thinking about what plausible runtime upper limits on a close() loop would look like.

Thank you for your report.

For libgpg-error, I pushed the change which uses closefrom, in the commit: rEe3e793302b67: spawn: Use closefrom when available.

werner renamed this task from `_gpg_close_all_fds` hangs on modern Linux when `/proc/self/fd` is unavailable; spawning a process without `GPGRT_SPAWN_INHERIT_FILE` takes > 1 hour to _gpg_close_all_fds hangs on nwer Linux systems in a simple chroot w/o /proc/self/fd.Jan 8 2025, 8:50 AM
werner triaged this task as Normal priority.
werner added a project: Linux.

@gniibe: Please see gpgme/src/posix-io.c where we have this:

          /* First close all fds which will not be inherited.  If we
           * have closefrom(2) we first figure out the highest fd we
           * do not want to close, then call closefrom, and on success
           * use the regular code to close all fds up to the start
           * point of closefrom.  Note that Solaris' and FreeBSD's closefrom do
           * not return errors.  */
#ifdef HAVE_CLOSEFROM
          {
            fd = -1;
            for (i = 0; fd_list[i].fd != -1; i++)
              if (fd_list[i].fd > fd)
                fd = fd_list[i].fd;
            fd++;
#if defined(__sun) || defined(__FreeBSD__) || defined(__GLIBC__)
            closefrom (fd);
            max_fds = fd;
#else /*!__sun */
            while ((i = closefrom (fd)) && errno == EINTR)
              ;
            if (!i || errno == EBADF)
              max_fds = fd;
#endif /*!__sun*/
          }
#endif /*HAVE_CLOSEFROM*/

@werner I read the code of gpgme/src/posix-io.c. I understand the two points:

  • For the correctness sake, the possible interrupted closefrom should be handled.
  • we can share the code with closefrom case and non-closefrom case.

For the first point, I think that we also need to care about possible interrupted close, too.

I'm going to improve the code for those points.