Non-ascii characters not shown if message is signed with a key that is not imported
Open, LowPublic

Description

I just noticed that if there are messages containing æøå (probably other non-ascii characters as well, but these are what I have observed), and they are signed with a key that I have not imported, then those characters are simply omitted.

So for example the string: 'på tørn ære' will be shown as 'p trn re'

However - once the corresponding key is imported, it shows correctly. Which is rather surprising, but leads me to believe that the fix is likely fairly easy.

Details

Version
gpg4win 3.1.3
kjellchr created this task.Sep 6 2018, 9:24 AM
aheinecke claimed this task.Sep 6 2018, 9:51 AM
aheinecke triaged this task as Normal priority.
aheinecke added a subscriber: aheinecke.

Thanks for the report. I was not aware of this but Indeed the fix should be easy. I think I already know the cause ;-)

On verification error we show the "unverified content" and I think we do not have encoding handling in that codepath as we probably thought outlook would handle that.

Mmh no, I can't reproduce this and my initial hunch was wrong. We do in fact handle encoding in that case.

While looking at the encoding handling I found a bug (and fixed it with rOa0671cc ) that the last line of a mail was sometimes not properly decoded but this is not your issue as that would not eat the characters.

Here is how I tried to reproduce it:

The problem might be specific to the encoding / structure of the mail you receive. Could you possibly send me a mail where the encoding would be broken for you if you had not imported the key?

My address is aheinecke@intevation.de

interesting. email sent.

Received. Thanks, so this is PGP Inline. Encoding handling in PGP Inline is always "Guessing" as it is no where defined which encoding is used for the message.

If you forward it like you did the header of the Mail might claim a different encoding then what was in the original message.
I also had to snip the heading out of the mail as GpgOL only handles Inline signed mails when they do not have leading text.

But If I do not have the key imported it works. The header claims iso-8859-1 and the message can be decoded in that encoding. But maybe the original header said something different?

If I have the key imported an break the signature I see wrong encoding:


I'll fix that now. This is caused by a "hail mary" approach to encoding handling where when we see a bad pgp inline signature we try to interpret it as UTF-8 and then try again. (This was introduced with: T3962 )

But this still does not look like your originally reported problem. Could you maybe save the mail as .msg in outlook or .mbox in a different MUA and send it in packaged in a .zip archive (attached .msg files are converted by outlook when sending so it's better to send a zip).
That way I should see the exact same mail, headers et.al.

header of mail forwarded. looks like it says utf-8. either way, it does work with the right key.

you also said:

"I also had to snip the heading out of the mail as GpgOL only handles Inline signed mails when they do not have leading text."

I think this might be a ticket in itself. If I send a PGP signed email to someone who then responds to me, there should ideally not be issues with it - although I think it would be important to separate which parts are signed and which are not.

I think this might be a ticket in itself. If I send a PGP signed email to someone who then responds to me, there should ideally not be issues with it - although I think it would be important to separate which parts are signed and which are not.

That would get a won't fix at this point. PGP/Inline is a crutch and should not be used. In Outlook I also don't see a user friendly way (which is not easily fakeable by an HTML mail) to show which parts of a mail are signed by what key.

updated example sent.

aheinecke lowered the priority of this task from Normal to Low.Oct 16 2018, 11:39 AM

I finally got around to look at your examples. Sorry for the delay. I can reproduce the issue and understand the problem.

In your example broken mail the header says:
Content-Type: text/plain; charset="utf-8"

But the Æ in the mail is encoded as "0xC6" which is a latin encoding and not UTF-8.

So GpgOL converts it to UTF-8 before passing it back to Outlook and the characters are messed up completely.
It works in the case where the key is available because we have some fallbacks for encoding when a mail was not properly verified. E.g. we pass it verbatim to GnuPG -> GnuPG gives us back the decoded data in local system encoding (which would work here) then we convert that back to utf-8 and it works.

I'm not sure yet how to fix this properly. I give it low priority as the issue is only for broken mails and it is not extremely bad.