Page MenuHome GnuPG

charset weirdness with non-ascii User IDs under non-UTF-8 locales
Closed, ResolvedPublic

Description

As originally reported back on
http://lists.gnupg.org/pipermail/gnupg-users/2012-July/045050.html, i'm having
difficulties searching for some non-ASCII user IDs when LANG=C, but not others:

0 dkg@alice:/tmp/cdtemp.fre2o5$ LANG=C gpg --keyserver keys.mayfirst.org
--search '=Andrew Lee (李健秋) <ajqlee@debian.org>'
gpg: searching for "=Andrew Lee (æå¥ç§) <ajqlee@debian.org>" from hkp server
keys.mayfirst.org
(1) Andrew Lee <andrew@linux.org.tw>
Andrew Lee (\xe6\x9d\x8e\xe5\x81\xa5\xe7\xa7\x8b) <ajqlee@debian.org>
Andrew Lee (\xe6\x9d\x8e\xe5\x81\xa5\xe7\xa7\x8b) <andrew@debian.org.t
Andrew Lee (§?î) <andrew@linux.org.tw>

	  1024 bit DSA key 0xB6250985, created: 2004-11-02

Keys 1-1 of 1 for "=Andrew Lee (李健秋) <ajqlee@debian.org>". Enter number(s),
N)ext, or Q)uit > q
0 dkg@alice:/tmp/cdtemp.fre2o5$ LANG=C gpg --keyserver keys.mayfirst.org
--search ='Antoine Beaupré <anarcat@debian.org>'
gpg: searching for "=Antoine Beaupré <anarcat@debian.org>" from hkp server
keys.mayfirst.org
gpg: key "=Antoine Beaupré <anarcat@debian.org>" not found on keyserver
0 dkg@alice:/tmp/cdtemp.fre2o5$

Event Timeline

I think that Andrew's name in Chinese in UTF-8 cannot be interpreted as iso8859-1,
it goes as is, but usually it is interpreted as iso8859-1 as a fallback.

If you are sure it's utf-8 encoded, you can use --utf8-strings option.

FWIW, recall that PGP is broken as it did not (does not?) use UTF8 but assumes
everything is ASCII. Thus we sometimes find weird charset conversion. GPA has
a heuristic to at least detect Latin-1

werner claimed this task.

We changed the fallback to utf-8 in 2.2 and 2.3 and thus this bug can be closed. On Windows there is still the problem with the command line. However, this is better tracked with T5038 and its related tasks.