Page MenuHome GnuPG

GpgOL: More fine grained discovery of content-id / embedded parts
Open, HighPublic

Description

With rOd87848059727587be1f660283e0aeb3be16cc382 GpgOL ignores all content-ids for attachments. This differs a bit from Outlook which somehow detects if an image is referenced inline or not and hides the attachment because of that.

The reason for this is that as soon as we add a content-id tag to an image it will be hidden by outlook in the list of the attachments. This causes a felt data loss when we hide attachments which are not referenced in the HTML Mail. So they are not shown in the attachment list and not visible inline. Because some MUAs or Mail services always added a content-id to all attachments.

But of course it is now only possible to have GpgOL handle embedded images if their content-disposition is inline and not attachment.

We need to somewhat parse the html, at least very simplistic to detect if a content-id is referenced in the HTML body and then set it for outlook.

Revisions and Commits

Event Timeline

aheinecke created this task.
werner raised the priority of this task from Normal to High.Jun 7 2022, 12:02 PM

Markus this ticket I find important as it has much user visible impact. While VS-NfD secops say you "should" not use H TML mail, most users and basically all non - VS-NfD users use the default of outlook anyway and use HTML.

In that case nearly all the big companies have embedded images in their E-Mail signatures or such things like "follow us on twitter". This means we have a multipart/signed or encrypted message with a multipart alternative GpgOL supports multipart/related and it was always a nice thing to show that encrypted mails have the same capabilities as non-encrypted mails. We as security experts know of course that you don't really want HTML Mails, but for many users these are very important. And currently when we have an embedded image it is crossed out because it cannot be found. And it is not technically difficult to fix that the images are displayed if you see the linked commit you see that i have just disabled this feature.

So we are talking about multipart/related mails structured in this way:

  • multipart/encrypted
    • multipart/alternative
      • text/html
      • text/plain
    • image/png

The image can be referenced in the text/html part by its "Content-ID" but as soon as you set a Content-ID Outlook hides the attachment in the messageview. But the problem for us was then that if the attachment was hidden because it had a content-id, and it was not referenced as beeing displayed in the Mail. The attachment was invisible for the user. This appeared to our users to be a data loss bug. "e.g. I sent the Mail with 5 attachments but 2 are invisible" or something like that. The workaround then was to disable HTML display, this will then also show attachments that have a content id. So it was my thoughtful decision to rather break multipart/related and always list all attachments. Then to have multipart/related working but hiding attachments under some circumstances.

Because some Mail generators or MUAs added a content-id to every attachment. It appears that outlook itself has some logic in it to then decide if the attachment should be shown in the attachment list or not. At least One example I had had a PDF attached with a content-id set. And Outlook, with GpgOL disabled, showed that just as a normal attachment, but with GpgOL active this was hidden. See: T4161: GpgOL: Attachments might be hidden in some cases

I have not touched this ticket since as I was scared about regressions, but this is nearly as important for some as the infamous category/flag problem. :)

Goals:

  • Since this has regression risk I would keep the current "do not set content-id on attachments" as a Workaround option in Gpg4win-tools gpgolconfig, but with default off. This way we can say in support if an attachment is listed that this option should be tried.
  • I would find it totally acceptable if we would always show all attached images as attached files, even if oulook would hide them. like "twitter.png" etc. So enabling the code to set the content-id again and then ensuring that the embedded file is listed under attachments.
    • I have tried this in the past to no avail, but with such things in Outlook you can suddenly find a way with OutlookSpy or a stackoverflow hint that solves this.
  • Additionally another idea I had was that after parsing I would "grep" for each content-id in the HTML source code, and only set the content-id when it is actually referenced in the HTML. Of course this needs to handle all uescaping, encoding etc. line breaking etc.
  • The test messages I had back then I cannot attach as they had customer data. But a standard example would indeed be a mail with two attachments, one image that is referenced in an <img> tag and one PDF that is not referenced but still has a content id. Ideally in that case the PDF should be displayed in the attachment list, and the png should be hidden.

Maybe its overthinking the problem of attachments with content-id but no reference in the HTML (btw. if mails are shown as plain text all attachments are listed regardless of their content id. ) I guess code like: if filename.endsWith(.png) || filename.endsWith(.jpg) || filename.endsWith(.jpeg) then ignore_cid=false; else ignore_cid = true. Would do the right thing 99% of the time. Core reference: rOd87848059727587be1f660283e0aeb3be16cc382

aheinecke added a subscriber: mmontkowski.

I misunderstood the problem. On the receiving side this was already fixed by me in 2019 9a9fe4e7fcad92bfba49ade9a6c44373f170ccd2

But then on the sending side it was broken again 9f81ed6561c5f41e50d1a51333c9586a33ed2ef6 where the content-id was lost copying the attachment.

So the only fix required here is a7349189f3af05822eba4bd17b62482fa2b0747f from work/aheinecke/T5982

Just for the record here is the fix which i _thought_ was required.