Page MenuHome GnuPG

segfault (NULL-pointer) when inspecting gpg Context after exception (python)
Closed, ResolvedPublic

Description

I'm using gpgme with python to encrypt/decrypt some data using pipes.
I experienced a segfault and started a hard search for the reason.
The final reason was not my application, but the python exception handler of abrt (abrt_exception_handler3).

It is doing some inspection of the objects which might have caused the exception, which also includes the gpg Context object.

A small snippet to reproduce the problem:

import gpg
import sys

def handle_exception(etype, value, tb):
    trace = tb
    while trace.tb_next:
        trace = trace.tb_next
    frame = trace.tb_frame
    for (key, val) in frame.f_locals.items():
        "{0}: {1}".format(key, repr(val))

sys.excepthook = lambda etype, value, tb: \
        handle_exception(etype, value, tb)

with gpg.Context() as ctx:
    raise Exception("error")

I'm running Fedora 36 with gpgme-1.15.1-6.fc36.x86_64 and python3-gpg-1.15.1-6.fc36.x86_64.

Details

Version
1.15.1

Related Objects

Event Timeline

werner triaged this task as Normal priority.Jul 26 2022, 8:59 PM
werner added projects: gpgme, Python.
gniibe added a project: Restricted Project.
gniibe added a subscriber: gniibe.

The SEGV was due to access to gpgme library after self.wrapped is set to None in the __del__ function.

Fixed for the invalid access in master.

Can confirm, we've been running into this as well, but never filed a bug report. Our solution is to have this in our codebase:

class GPGContext(gpg.Context):
    """Extended GPG context."""

    # [... some private functionality ...]

    # The gpg.Context exit function explicitly deletes the context data.
    # This may lead to segfaults when sentry tries to get context data for an error report.
    # Therefore, we define a new context exit function, that does not delete the data.
    def __exit__(self, type, value, tb):
        return False

Looks like these two solutions don't conflict; I'm just documenting this here to help people running into unexpected "None" values in their Sentry reports.

@jap Thank you.

I now understand the cause of SEGV. It's __exit__ method which calls self.__del__() in the class Context (and Data too). That is wrong, because it may run an exception handler with the object.

That part should be fixed.

I think the fix should be something like this:

diff --git a/lang/python/src/core.py b/lang/python/src/core.py
index 81f961d9..95fd0cba 100644
--- a/lang/python/src/core.py
+++ b/lang/python/src/core.py
@@ -1189,8 +1189,9 @@ class Context(GpgmeWrapper):
     def __enter__(self):
         return self
 
-    def __exit__(self, type, value, tb):
-        self.__del__()
+    def __exit__(self, exc_type, value, tb):
+        if not exc_type:
+            self.__del__()
 
     def op_keylist_all(self, *args, **kwargs):
         self.op_keylist_start(*args, **kwargs)
@@ -1527,8 +1528,9 @@ class Data(GpgmeWrapper):
     def __enter__(self):
         return self
 
-    def __exit__(self, type, value, tb):
-        self.__del__()
+    def __exit__(self, exc_type, value, tb):
+        if not exc_type:
+            self.__del__()
 
     def _free_datacbs(self):
         self._data_cbs = None

I'll push this, if no problem.

Not sure if that is the complete fix - if you do something like:

with gpg.Context(...) as context:
    ...
... cause an exception after the context has been closed ...

then context will still be a valid reference to the gpg.Context instance, and may cause segfaults when something tries to access things inside it (f.e. for serialisation).
I like your previous solution with the accessor checks, because that actually fixes the issue.
Stylistically, maybe __del__ should just be renamed to cleanup or free, and then make sure to call that function from both __exit__ and __del__.

Indeed, you are right. The object created by with can be valid even after the context (when referenced by another object).

Actually, I didn't/don't understand the reason why the original code has a call to __del__ in __exit__.
I assumed that there was some valid reason to do so, and I was conservative to keep the code calling __del__ in normal case.

But, this was not good. I should remove the call to __del__ in __exit__. If an application wants to reclaim the reference to GPGME aggressively, it can subclass Context and define its own __exit__ which calls __del__ by itself.

Note that the accessor checks are in master branch already.

werner removed a project: Restricted Project.