Encoding problem: gpg truncates multibyte characters in interactive prompts on Windows
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	walgare
	Feb 13 2019, 7:43 PM

Description

On Windows, when entering multibyte characters in an interactive prompt, gpg(2) only takes the first byte of each character.

This creates problems when generating new keys with Chinese names, for example.
Chinese characters are usually encoded with 2 bytes in appropriate Windows code page (e.g. "cp936" aka "GBK"). Taking only the first byte makes it impossible (or at least very difficult) to create keys with correct Chinese names using gpg.

Example command and output:

gpg -vvv --generate-key
gpg (GnuPG) 2.2.13; Copyright (C) 2019 Free Software Foundation, Inc.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

gpg: using character set 'GBK'
Note: Use "gpg --full-generate-key" for a full featured key generation dialog.

GnuPG needs to construct a user ID to identify your key.

Real name: 一二三四五六
Email address:
You are using the 'GBK' character set.
You selected this USER-ID:
    "叶人瘟"

Change (N)ame, (E)mail, or (O)kay/(Q)uit? q
gpg: Key generation canceled.

Python commands verifying that gpg really is taking the first byte of each encoded character:

>> for c in '一二三四五六':
..     print(c, c.encode('gbk'))
..
一 b'\xd2\xbb'
二 b'\xb6\xfe'
三 b'\xc8\xfd'
四 b'\xcb\xc4'
五 b'\xce\xe5'
六 b'\xc1\xf9'

>> print(b'\xd2\xb6\xc8\xcb\xce\xc1'.decode('gbk'))
'叶人瘟'

Note:

This problem only exists for some interactive prompts (AFAIK), not for command line arguments, not for PGP messages typed in stdin.
I'm using the "simple installer" for gpg on Windows, not gpg4win. Although, I believe the "gpg.exe" binary is exactly the same in both versions.

Details

Version: 2.2.13

Revisions and Commits

rG GnuPG
	rGb912f07cdf00 w32: Always use Unicode for console input and output.
	rG90aadf69f730 common,w32: Allow Unicode input and output with the console.
	rG8c41b8aac3ef w32: Always use Unicode for console input and output.
	rGf165c8a737cc common,w32: Allow Unicode input and output with the console.

Related Objects
Search...

Status	Assigned	Task
Resolved	• werner	T4417 Work needed for gnupg 2.3
Resolved	• werner	T4398 Rework Console and command line handling on Windows
Resolved	• werner	T4365 Encoding problem: gpg truncates multibyte characters in interactive prompts on Windows
Resolved	• werner	T6741 gpg 2.3+ may display garbled characters for date and time in non-English Windows

Event Timeline

walgare created this task.Feb 13 2019, 7:43 PM

walgare created this object in space S1 Public.

walgare added a project: Bug Report.Feb 13 2019, 9:15 PM

walgare renamed this task from Encoding problem: gpg truncates multibyte characters in stdin on Windows to Encoding problem: gpg truncates multibyte characters in interactive prompts on Windows.Feb 14 2019, 12:55 AM

walgare updated the task description. (Show Details)

Please try "gpg --quick-gen-key" which takes the user-id on the command line - that uses a different code path.

In general gpg has been changed long ago to use only utf-8 and it is unlikely that we will add support for other multi-byte code pages. Either use a utf-8 enabled shell or a GUI tools. GUI toolkits are much better suited for other input methods.

I don't think code page is the problem per se though.

I can configure cmd.exe to use UTF-8, and the error remains.
I can also use gpg2 on Cygwin, using either UTF-8 or GBK, and there's no error at all.

It seems like a general problem with how gpg is interacting with Windows console, not a code page specific problem.

I reviewed the multibyte handling in GnuPG and you are right, there is a general problem because we use ReadConsoleA and basically GetCommandLineA, so there is no way for multibyte input unless a parameter file is used. Output is also broken, but that is easier to fix iff the input case has been fixed.

Fixing this won't happen in 2.2 but is planned for 2.3. See T4398 for the plan.

• werner added a parent task: T4398: Rework Console and command line handling on Windows.Feb 10 2021, 2:59 PM

• werner added a commit: rGf165c8a737cc: common,w32: Allow Unicode input and output with the console..Mar 5 2021, 10:50 AM

• werner added a commit: rG8c41b8aac3ef: w32: Always use Unicode for console input and output..Mar 5 2021, 3:38 PM

That it. Things works nicely for me. Won't be backported to 2.2 because this introduces minor changes in the behaviour.

There are problems with input from stdin on the console: For example running

> gpg --enarmor
[Enter umlauts etc]
^Z

This end up as 0x00 bytes after --dearmoring while other characters are okay. Given that this is only used for debugging I don't consider this a real world problems. I am a bit curious why this happens, but don't want to spend more time on it.

• werner added a commit: rG90aadf69f730: common,w32: Allow Unicode input and output with the console..Jun 8 2021, 11:12 AM

• werner added a commit: rGb912f07cdf00: w32: Always use Unicode for console input and output..

• werner mentioned this in T5960: Kleopatra: Encoding problems with GnuPG output on Windows.Sep 13 2023, 2:06 PM

• werner added a subtask: T6741: gpg 2.3+ may display garbled characters for date and time in non-English Windows.Oct 2 2023, 2:51 PM

• werner changed the status of subtask T6741: gpg 2.3+ may display garbled characters for date and time in non-English Windows from Open to Testing.Oct 27 2023, 2:23 PM

• werner closed subtask T6741: gpg 2.3+ may display garbled characters for date and time in non-English Windows as Resolved.Jan 24 2024, 2:42 PM