Wrong charset in console messages (Cyrillic, Windows)
Closed, DuplicatePublic

Description

Russian messages are printed in wrong character set.

Details

Version
2.0.17
kiav set Version to 2.0.17.Oct 10 2011, 12:18 PM
kiav added projects: gnupg, Bug Report.
kiav added a subscriber: kiav.

werner added a subscriber: werner.Oct 11 2011, 11:19 AM

We always output plain UTF-8 on Windows.

kiav added a comment.Oct 11 2011, 11:25 AM

But why is it unreadable? I see Cyrillic letters (gpgconsole.jpg) but this is
not Russian text!

kiav added a comment.Oct 11 2011, 11:50 AM

I wrote a small test program for .Net to see what is console charset:

Console.WriteLine("Original console charset is 866")
Console.WriteLine("User readable name and codepage of charset: {0} {1}",
Console.OutputEncoding.EncodingName, Console.OutputEncoding.CodePage)
Console.WriteLine()

Console.WriteLine("Set console charset to ANSI (on Russian systems it is 1251)")
Console.OutputEncoding = System.Text.Encoding.Default
Console.WriteLine()

Console.WriteLine("User readable name and codepage of charset: {0} {1}",
Console.OutputEncoding.EncodingName, Console.OutputEncoding.CodePage)
Console.WriteLine("Pay attention to unreadable name of charset in this case.")

The output screenshot attached.

As I can see - the right charset for Russian systems is 866 (CP866).

We use what the system tells us. See jnlib/utf8conv.c:set_native_charset . An
alias for CP866 might be missing. We don't switch the console charset but use
libiconv to translate between charsets.

kiav added a comment.Oct 11 2011, 2:14 PM

Ok, I wrote a small test program in Visual C:

unsigned int cpno;

cpno = GetConsoleOutputCP ();
printf ("CP%u", cpno );

The output is CP866. So GnuPG should correctly recognize console charset.
See russianconsolecharset_vc.jpg

Perhaps the reason is in absence CP866 in aliases ...

kiav added a comment.Oct 11 2011, 2:49 PM

According to libiconv documentation, it recognizes CP866 as legal charset name.
http://www.gnu.org/s/libiconv/

So GnuPG does not need to convert an alias for it and

sprintf (codepage, "CP%u", cpno );
newset = codepage;

prepares the right value for newset in

int set_native_charset (const char *newset);

kiav added a comment.Oct 11 2011, 3:18 PM

Rather strange error:

1.) CP866 is supported by libiconv and successfully recognized bu GnuPG,
2.) Russian messages are correct Russian texts and in UTF-8 (po/ru.po opened in
Notepad++ as ANSI as UTF-8)

but GnuPG prints unreadable text!

kiav added a comment.Oct 11 2011, 4:23 PM

kiav added a comment.Oct 11 2011, 4:23 PM

I copied some text output from console to Notepad.exe (standard Windows program).
Look first line in gpg-consoleoutput-unicode.txt

It looks exactly like in screenshot (look next line after 'Home:
C:/Users/akir/AppData/Roaming/gnupg' in gpgconsole.jpg).

I translated this output on http://2cyr.com/decode/ (look second line in
gpg-consoleoutput-unicode.txt). This site detects windows-1251 as source
encoding and ibm866 as 'displayed as'. I do not know what does it mean.

windows-1251 is ANSI charset on Russian Windows
ibm866 is an alias for CP866 (OEM charset on Russian Windows)

kiav added a comment.Oct 16 2011, 10:13 AM

A test program compiled with MinGW under Windows gives the same result - CP866.
I used just 'gcc console.c' and ran a.exe in MinGW Shell.

#include <windows.h>
#include <stdio.h>

int main()
{
unsigned int cpno;

cpno = GetConsoleOutputCP ();
printf ("CP%u", cpno );

return 0;
}

I can't compile the whole GnuPG under Windows to verify it. I suppose nobody can
because GnuPG for Windows can be compiled using cross-compiling only.

A fix for this has been included in gpg4win 2.2.2.

GnuPG already converted the Output but to CP_ACP instead of the
"GetConsoleOutputCP" which was wrong.

Does this now also work for you? I've only tested it with the Codepage for Germany.

kiav added a comment.Nov 22 2014, 12:07 AM

Does this now also work for you?

Yes. Thank you.

aheinecke closed this task as Resolved.
aheinecke claimed this task.