Page MenuHome GnuPG

Charset / codepage problems in GnuPG 2.0.26 on MS Windows
Closed, ResolvedPublic

Details

Version
2.1

Event Timeline

GnuPG 2.0.26 (part of Gpg4win 2.2.2-beta37, but the --version command
incorrectly reports Gpg4win 2.2.2-beta33) on MS Windows 7 64 bit SP1 seems to
have the usual annoying bug related to a charset / codepage translation problem
when localized in a language other than english (italian, in my case).

For instance, the character "è" 'LATIN SMALL LETTER E WITH GRAVE' [(U+00E8) |
UTF-8 0xC3 0xA8 (c3a8) | CP850 8a] is wrongly displayed as the character "Þ"
'LATIN CAPITAL LETTER THORN' [(U+00DE) | UTF-8 0xC3 0x9E (c39e) | CP850 e8]

GnuPG 1.4.18 (binary from gnupg.org) does not have the above problem.

My system:
Windows 7 64 bit SP1
Localization: Italy - italian
Command prompt codepage: 850

This is known. See the other bug report.

Thanks Werner. I had already done a search among the issues using the world "charset", but the 1373 issue was
not displayed.

However the bug occurs more problematically with passphrases: a secret key passphrase with non ASCII characters
set using GPG 1.4.18 for a given key is considered not valid using GPG v 2.0.26 to deal with the same key and
vice versa.
I think this problem may cause a lot of trouble to users.

The bug also occurs using filenames in which there are characters outside the ASCII block. This occurs also
setting LC_MESSAGES=C or LC_MESSAGES=en_US.UTF-8 (i.e. gpg -vvvv --verify C:\Users\Andrea\Downloads\gpg4win-
2.2.2-beta37_è_.exe.sig -> "gpg: assuming signed data in `C:\\Users\\Andrea\\Downloads\\gpg4win-2.2.2-
beta37_Þ_.exe'" in which are also wrongly displayed double backslashes instead of single) and also with gpg
1.4.18 (but without the double backslashes). Luckily this does not prevent the signature verification. It's
similar to T1409.

Some additional information:

the CHCP command executed at the Windows command prompt reports: "Tabella codici attiva: 850" (active codepage
850);

using the -vvvv option with the --verify command, gpg 2.0.26 and 1.4.18 report: "gpg: using character set
`CP850'".

andreaerdna renamed this task from Charset / codepage problem in GnuPG 2.0.26 localized in italian on MS Windows to Charset / codepage problems in GnuPG 2.0.26 on MS Windows.Aug 19 2014, 2:36 AM

I've taken a look at this. The problem is that the working conversion code in
jnlib/utf8conv.c is not used on Windows but instead jnlib/w32-gettext.c does
it's own conversion to wchar and then back from wchar to the native codepage
which is simpler and should work.

But the conversion back used the wrong codepage. CP_ACP instead of the codepage
retuned by GetConsoleOutputCP. jnlib/utf8conv.c actually had a comment
explaining why it is neccessary to use GetConsoleOutputCP.

With this changed (see attached patch) I get correct output and can verify /
sign files with non-ascii filenames.

I think gnupg master behaves differently though and I don't have a test setup
for this so the patch is only against STABLE.

Werner any objections into including this patch into GnuPG / Gpg4Win?

Good anlysis. Thanks.
Feel free to put it as a patch into gpg4win.
I need to look closer at it because we have have the gettext code also in
libgpg-error. You should also send a DCO for GnuPG.

Thanks Andre for the patch!

I managed to build gpg4win with the patch added and I verified that it seems to
solve the problem reported by me and also in Issues 1373 and 1674!

But I'd like to summarize the problems related to the charset / codepage on MS
Windows of which I am aware, as a reminder:

  1. incorrect display of GPG 2 output translated into another language (also

reported in Issue 1373 and Issue 1674): fixed by your patch;

  1. passphrases (both for secret keys and symmetrical encryption) with non ASCII

characters set using GPG 1.4.18 are considered not valid using GPG 2.0.26 and
vice versa

  1. incorrect display of filenames with non ASCII characters (also in Issue 1409)
  1. GPG 2.0.26 and 1.4.18 ignore or weirdly comply with --utf8-strings, --no-

utf8-strings or --charset options for utf-8 encoding of encrypted filenames (see
Issue 1409)

  1. charset weirdness searching keyserver for some non-ASCII user IDs under non-

UTF-8 locales (see Issue 1514 - although relates to Linux it seems to occur also
on Windows, both CLI and GPA but not Kleopatra)

Hope this will help to improve the great GnuPG :-)

I've commited the patch to gpg4win so it will be part of the 2.2.2 release.

Thanks for summing up the other problems. I've added a reference to this issue
to the "Improve encoding handling" point in the backlog:
http://wiki.gnupg.org/Gpg4win/Wishlist

T1624 is another issue related to this. GPGex
/ Kleopatra file / folder encrypt does not work with non ASCII characters.

Updated Patch against libgpg-error where this code now lives.

Please apply this patch or something similiar.

The problem I can see is that with this code in libgpg-error now GUI
applications may use it which want to get "GUI Native".

Probably better to introduce a new function "wchar_to_console" ? And use it from
GnuPG. Does GPA use that conversion function?

Might be a good time for this now where gnupg master already depends on new
symbols in libgpg-error.

After some more discussion and testing in the development jabber channel werner
agreed to include this patch. Pushed to libgpg-error with 823e858. So this will
hopefully be part of the first gnupg modern release that will include localization.

It sounds great!

So this patch, as the previous one, solves the "incorrect display of GPG 2
output translated into another language" (as reported here and previously also
in Issue 1373 and Issue 1674).

Does this patch solve also the "incorrect display of filenames with non ASCII
characters" (as reported here and previously also in Issue 1409)?

By the way, as I understand, this patch doesn't fix:

  • the more critical "passphrase with non ASCII characters" problem (as reported

only here, see T1691 (andreaerdna on Aug 19 2014, 02:36 AM / Roundup)); does this bug need a
dedicated new Issue to be addressed and solved?

  • the "utf-8 encoding of encrypted filenames" / "strange behaviour of --utf8-

strings, --no-utf8-strings and --charset options" (as reported in Issue 1409 ad
probably similar to Gpgtar Issue 1624 / Gpa Issue 2185)

  • the "charset weirdness searching keyserver for some non-ASCII user IDs under

non-UTF-8 locales" (as reported in Issue 1514).

Thanks for helping keep track of all these issues.

Yes this only fixes the problem that has already been fixed in the last Gpg4win
Versions. So that this will be fixed in future gnupg-2.1 versions.

Still to help us better seperate the problems I would like to close this as for
me this bug was about "Wrong encoding in a localized version".

  • the more critical "passphrase with non ASCII characters" problem (as reported

only here, see T1691 (andreaerdna on Aug 19 2014, 02:36 AM / Roundup)); does this bug need a
dedicated new Issue to be addressed and solved?

I actually overlooked this in this issue. Can you please open another issue for
that. And add me to the Nosy.

  • the "utf-8 encoding of encrypted filenames" / "strange behaviour of --utf8-

strings, --no-utf8-strings and --charset options" (as reported in Issue 1409 ad
probably similar to Gpgtar Issue 1624 / Gpa Issue 2185)

If this problem was still existing with gpg4win this is still a problem.

  • the "charset weirdness searching keyserver for some non-ASCII user IDs under

non-UTF-8 locales" (as reported in Issue 1514).

This appears not to be windows specific. Also I think this works except for
cases where the Key in question is problematic. If I search on windows for
emanuel@intevation.de I get the correct Umlauts shown. Might be a Problem though
for characters that are unrepresentable in the 8 Bit codepage.