Page MenuHome GnuPG

internationalization (support UNICODE/UTF-8 character set)
Closed, InvalidPublic

Description

I'm filing this as a faeture request because I do not know if some standard requests the GPG keys are for ASCII 7-bit character set only.

When I create a key for my e-mail address with "gpg --generate-key", I'm asked for my name. In my shell and throughout my GUI, I use UTF-8 character set. Thus I write my name with german umlauts. This is not reflected in the key, since it uses ASCII only.

Event Timeline

OpenPGP specifies the use of UTF-8 for all meta data (ie. everything except for the signed/encrypted data). GnuPG has always supported this. I don't known on which OS you are but some don't have UTF-8 support on the command line or tty so you need to tweak your environment first.

Example using the tty in interactive mode:

GnuPG needs to construct a user ID to identify your key.

Real name: Otto Müller
Email address: otto.mueller@example.org
You are using the 'utf-8' character set.
You selected this USER-ID:
    "Otto Müller <otto.mueller@example.org>"
Change (N)ame, (E)mail, or (O)kay/(Q)uit? o
[...]
public and secret key created and signed.
pub   rsa2048 2020-06-28 [SC] [expires: 2022-06-28]
      467644BA5484C72C717324F17904A1D825589D97
uid                      Otto Müller <otto.mueller@example.org>
sub   rsa2048 2020-06-28 [E] [expires: 2022-06-28]

Example in command line mode:

$ gpg --quick-gen-key 'Anna Möller <anna.moeller@example.org>'
About to create a key for:
   "Anna Möller <anna.moeller@example.org>"
Continue? (Y/n) y
[...]
public and secret key created and signed.
pub   rsa2048 2020-06-28 [SC] [expires: 2022-06-28]
      CD75FC038AF390A2019686E4F4CAE8FF60168A75
uid                      Anna Möller <anna.moeller@example.org>
sub   rsa2048 2020-06-28 [E]

Of course I am using an xterm with the en_US.UTF-8 locale which is the default on all more or less modern Unix systems. On Windows you better use a GUI Frontend to avoid problems with the encoding.

Hello Werner,

thx for your very quick answer! As I wrote in the initial report, I'm using UTF-8 throughout the system for human users. My system is FreeBSD-12.1-RELEASE-p6.
You can see in the attached screenshot, that the gpg CLI shows a non-consistent handling of the current character set (UTF-8):

  • the 1st translated messages does not use it (marked: "Schl?ssel" should be "Schlüssel")
  • then it is used, but gpg claims I'm using US-ASCII, which is in fact not the case

So either there is some minor bug in the upstream gpg source, or I have to file this in as a bug of the gpg port on FreeBSD. I have not yet installed a full Linux to compare such issues; in my /compat/linux (basic CentOS 7.7) there is no gpg (and no port to install it).

Greetings, "Walter"
Screenshot

in your test, which you did on Linux I guess, utf-8 is written downcase, whereas on my system, it is written uppercase 'UTF-8, conforming to what I find elsewhere (e.g. Wikipedia and RFC 3629). I do not know though, if there is a recommended way to spell it. So the bug might be: gpg does not compare the RFC spelling uppercase, but the linuxism: utf-8 witten downcase. Then the correct fix would be to compare uppercase UTF-8 only, and let Linux fix their system to use the correct uppercase throughout the system... ;)
2nd, I know that FreeBSD has some issues with internationalization: it does not support charsets in their POSIX meaning, but emulates them by combining all available locales and (matching) CODESETs. Usually, this is not a problem, and most translations and handling of UTF-8 works as expected. Maybe this has some subtle effect causing this issue.

My FreeBSD box is currently not up, so I can't test right now. You may want to look into gnupg/common/utf8conv.c and there set_native_charset(). For historical reasons we start off with latin-1 but then swicth to the selected charset and intialize iconv accordingly. In the case of an error we sometimes fallback to utf-8. You may want to add some debug code (log_debug ("foo bar string=%s\n", some_string);)

It seems that nl_langinfo(CODESET) returns US-ASCII on your system.

Do you happen to specify other environment variable LC_CTYPE or LC_ALL (to C or POSIX)?

I mean, when LC_CTYPE=C, US-ASCII is selected.

On my machine (Debian GNU/Linux), it's like:

$ env LANG=de_DE.UTF-8 LC_CTYPE=C locale charmap
ANSI_X3.4-1968

When LANG only, it works:

$ env LANG=de_DE.UTF-8 locale charmap
UTF-8

Hello Mr. Niibe,

thank you for your suggestion. There seems to be some incomplete handling concerning locales. I set LANG and MM_CHARSET in my .login_conf, and LC_NUMERIC in KDE to get the modern format with a space every thousands ("3 456 789,12"). The rest I leave up to the magic behind the scenes. FreeBSD's manual page for locale(1) states it conforms to POSIX.1.
I do not know if locale(1) is supposed to set the LC_* according to the value of LANG, or if I have to set them all explicitely.
So as a workaround I will set the LC_* in my .login_conf and ask a question in FreeBSD forum.

Thx for your time!

paul@t450s:~ % env|egrep '(LC|LANG|CHAR)'
LANG=de_DE.UTF-8
LANGUAGE=de:en_US
LC_NUMERIC=ksh_DE.UTF-8
MM_CHARSET=UTF-8
paul@t450s:~ % locale charmap
US-ASCII
paul@t450s:~ % locale
LANG=de_DE.UTF-8
LC_CTYPE="C"
LC_COLLATE="C"
LC_TIME="C"
LC_NUMERIC="C"
LC_MONETARY="C"
LC_MESSAGES="C"
LC_ALL=
paul@t450s:~ % cat .login_conf 
# $FreeBSD: releng/12.1/share/skel/dot.login_conf 77995 2001-06-10 17:08:53Z ache $
#
# see login.conf(5)
#
me:\
        :charset=UTF-8:\
        :lang=de_DE.UTF-8:

I regret to have distracted your attention. All the above applies to a terminal window (KDE's konsole) in my GUI KDE. On the bare FreeBSD console, everything is fine. So this is a bug in some KDE library or konsole. I'm sorry I did not have the idea to test that on the bare console right away. I'll close this bug here.

Thx again.

Your welcome.

I am so used to xterm/uxterm that I virtually never use Konsole - even if I have to fix some things at another workplace ;-)