Page MenuHome GnuPG

Encoding problem: gpg truncates multibyte characters in interactive prompts on Windows
Closed, ResolvedPublic

Description

On Windows, when entering multibyte characters in an interactive prompt, gpg(2) only takes the first byte of each character.

This creates problems when generating new keys with Chinese names, for example.
Chinese characters are usually encoded with 2 bytes in appropriate Windows code page (e.g. "cp936" aka "GBK"). Taking only the first byte makes it impossible (or at least very difficult) to create keys with correct Chinese names using gpg.

Example command and output:

gpg -vvv --generate-key
gpg (GnuPG) 2.2.13; Copyright (C) 2019 Free Software Foundation, Inc.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

gpg: using character set 'GBK'
Note: Use "gpg --full-generate-key" for a full featured key generation dialog.

GnuPG needs to construct a user ID to identify your key.

Real name: 一二三四五六
Email address:
You are using the 'GBK' character set.
You selected this USER-ID:
    "叶人瘟"

Change (N)ame, (E)mail, or (O)kay/(Q)uit? q
gpg: Key generation canceled.

Python commands verifying that gpg really is taking the first byte of each encoded character:

>> for c in '一二三四五六':
..     print(c, c.encode('gbk'))
..
一 b'\xd2\xbb'
二 b'\xb6\xfe'
三 b'\xc8\xfd'
四 b'\xcb\xc4'
五 b'\xce\xe5'
六 b'\xc1\xf9'

>> print(b'\xd2\xb6\xc8\xcb\xce\xc1'.decode('gbk'))
'叶人瘟'

Note:

  • This problem only exists for some interactive prompts (AFAIK), not for command line arguments, not for PGP messages typed in stdin.
  • I'm using the "simple installer" for gpg on Windows, not gpg4win. Although, I believe the "gpg.exe" binary is exactly the same in both versions.

Details

Version
2.2.13

Event Timeline

walgare created this object in space S1 Public.
walgare renamed this task from Encoding problem: gpg truncates multibyte characters in stdin on Windows to Encoding problem: gpg truncates multibyte characters in interactive prompts on Windows.Feb 14 2019, 12:55 AM
walgare updated the task description. (Show Details)

Please try "gpg --quick-gen-key" which takes the user-id on the command line - that uses a different code path.

In general gpg has been changed long ago to use only utf-8 and it is unlikely that we will add support for other multi-byte code pages. Either use a utf-8 enabled shell or a GUI tools. GUI toolkits are much better suited for other input methods.

I don't think code page is the problem per se though.

I can configure cmd.exe to use UTF-8, and the error remains.
I can also use gpg2 on Cygwin, using either UTF-8 or GBK, and there's no error at all.

It seems like a general problem with how gpg is interacting with Windows console, not a code page specific problem.

werner triaged this task as Normal priority.Mar 8 2019, 8:18 AM
werner added projects: gnupg (gpg23), Windows.

I reviewed the multibyte handling in GnuPG and you are right, there is a general problem because we use ReadConsoleA and basically GetCommandLineA, so there is no way for multibyte input unless a parameter file is used. Output is also broken, but that is easier to fix iff the input case has been fixed.

Fixing this won't happen in 2.2 but is planned for 2.3. See T4398 for the plan.

werner claimed this task.

That it. Things works nicely for me. Won't be backported to 2.2 because this introduces minor changes in the behaviour.

There are problems with input from stdin on the console: For example running

> gpg --enarmor
[Enter umlauts etc]
^Z

This end up as 0x00 bytes after --dearmoring while other characters are okay. Given that this is only used for debugging I don't consider this a real world problems. I am a bit curious why this happens, but don't want to spend more time on it.