Encoding problem: gpg truncates multibyte characters in interactive prompts on Windows
Open, NormalPublic

Description

On Windows, when entering multibyte characters in an interactive prompt, gpg(2) only takes the first byte of each character.

This creates problems when generating new keys with Chinese names, for example.
Chinese characters are usually encoded with 2 bytes in appropriate Windows code page (e.g. "cp936" aka "GBK"). Taking only the first byte makes it impossible (or at least very difficult) to create keys with correct Chinese names using gpg.

Example command and output:

gpg -vvv --generate-key
gpg (GnuPG) 2.2.13; Copyright (C) 2019 Free Software Foundation, Inc.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

gpg: using character set 'GBK'
Note: Use "gpg --full-generate-key" for a full featured key generation dialog.

GnuPG needs to construct a user ID to identify your key.

Real name: 一二三四五六
Email address:
You are using the 'GBK' character set.
You selected this USER-ID:
    "叶人瘟"

Change (N)ame, (E)mail, or (O)kay/(Q)uit? q
gpg: Key generation canceled.

Python commands verifying that gpg really is taking the first byte of each encoded character:

>> for c in '一二三四五六':
..     print(c, c.encode('gbk'))
..
一 b'\xd2\xbb'
二 b'\xb6\xfe'
三 b'\xc8\xfd'
四 b'\xcb\xc4'
五 b'\xce\xe5'
六 b'\xc1\xf9'

>> print(b'\xd2\xb6\xc8\xcb\xce\xc1'.decode('gbk'))
'叶人瘟'

Note:

  • This problem only exists for some interactive prompts (AFAIK), not for command line arguments, not for PGP messages typed in stdin.
  • I'm using the "simple installer" for gpg on Windows, not gpg4win. Although, I believe the "gpg.exe" binary is exactly the same in both versions.

Details

Version
2.2.13
walgare created this task.Feb 13 2019, 7:43 PM
walgare created this object in space S1 Public.
walgare renamed this task from Encoding problem: gpg truncates multibyte characters in stdin on Windows to Encoding problem: gpg truncates multibyte characters in interactive prompts on Windows.Feb 14 2019, 12:55 AM
walgare updated the task description. (Show Details)
werner added a subscriber: werner.Feb 14 2019, 7:38 AM

Please try "gpg --quick-gen-key" which takes the user-id on the command line - that uses a different code path.

In general gpg has been changed long ago to use only utf-8 and it is unlikely that we will add support for other multi-byte code pages. Either use a utf-8 enabled shell or a GUI tools. GUI toolkits are much better suited for other input methods.

walgare added a comment.EditedFeb 16 2019, 10:24 PM

I don't think code page is the problem per se though.

I can configure cmd.exe to use UTF-8, and the error remains.
I can also use gpg2 on Cygwin, using either UTF-8 or GBK, and there's no error at all.

It seems like a general problem with how gpg is interacting with Windows console, not a code page specific problem.

werner triaged this task as Normal priority.Mar 8 2019, 8:18 AM
werner added projects: gnupg (gpg23), Windows.

I reviewed the multibyte handling in GnuPG and you are right, there is a general problem because we use ReadConsoleA and basically GetCommandLineA, so there is no way for multibyte input unless a parameter file is used. Output is also broken, but that is easier to fix iff the input case has been fixed.

Fixing this won't happen in 2.2 but is planned for 2.3. See T4398 for the plan.