charset weirdness with non-ascii User IDs under non-UTF-8 locales
Closed, ResolvedPublic


As originally reported back on, i'm having
difficulties searching for some non-ASCII user IDs when LANG=C, but not others:

0 dkg@alice:/tmp/cdtemp.fre2o5$ LANG=C gpg --keyserver
--search '=Andrew Lee (李健秋) <>'
gpg: searching for "=Andrew Lee (æå¥ç§) <>" from hkp server
(1) Andrew Lee <>
Andrew Lee (\xe6\x9d\x8e\xe5\x81\xa5\xe7\xa7\x8b) <>
Andrew Lee (\xe6\x9d\x8e\xe5\x81\xa5\xe7\xa7\x8b) <
Andrew Lee (§?î) <>

	  1024 bit DSA key 0xB6250985, created: 2004-11-02

Keys 1-1 of 1 for "=Andrew Lee (李健秋) <>". Enter number(s),
N)ext, or Q)uit > q
0 dkg@alice:/tmp/cdtemp.fre2o5$ LANG=C gpg --keyserver
--search ='Antoine Beaupré <>'
gpg: searching for "=Antoine Beaupré <>" from hkp server
gpg: key "=Antoine Beaupré <>" not found on keyserver
0 dkg@alice:/tmp/cdtemp.fre2o5$

dkg added a subscriber: dkg.
gniibe added a subscriber: gniibe.Jul 13 2013, 1:02 PM

I think that Andrew's name in Chinese in UTF-8 cannot be interpreted as iso8859-1,
it goes as is, but usually it is interpreted as iso8859-1 as a fallback.

If you are sure it's utf-8 encoded, you can use --utf8-strings option.

werner added a subscriber: werner.Jul 16 2013, 1:04 PM

FWIW, recall that PGP is broken as it did not (does not?) use UTF8 but assumes
everything is ASCII. Thus we sometimes find weird charset conversion. GPA has
a heuristic to at least detect Latin-1

werner closed this task as Resolved.Fri, Nov 27, 6:30 PM
werner claimed this task.

We changed the fallback to utf-8 in 2.2 and 2.3 and thus this bug can be closed. On Windows there is still the problem with the command line. However, this is better tracked with T5038 and its related tasks.