hm, common/utf8conv.c says this:
I wonder if this is still the best choice. In my experience, far more machines
have text in some UTF-8 encoding today than in Latin-1. this is especially true
for systems that deal with OpenPGP User IDs, where UTF-8 is the canonical
representation.
If the user's environment claims that it's plain ASCII and we're seeing 8-bit
characters, gpg does have to make a decision about what to do. i see four options:
a) report an error and fail.
b) pretend that the 8-bit characters are Latin-1 (this is "OK" because any
bytestring is a valid Latin-1 string)
c) pretend that the 8-bit characters are UTF-8
d) do some sort of autodetection on the bytestring (e.g. if it is a valid UTF-8
byte sequence then treat as UTF-8, otherwise treat as Latin-1)
option (a) is annoying and likely a cause of spurious complaints, as the comment
notes. GnuPG is currently going with option (b). Option (c) seems more
reasonable to me because of OpenPGP's relationship with UTF-8, but introduces
some error cases (what do we do where the bytestring is not valid UTF-8?).
Option (d) avoids error cases but might be a bit more delicate to implement.
What do you think?