Page MenuHome GnuPG

Special characters encoding issue with LDAP keyserver.
Closed, WontfixPublic

Description

Hello,

I set up an LDAP keyserver using OpenLDAP and am exporting my keys using the
--send-keys option. When sending PGP keys which have accents in the uid (french
accents, such as éèàù) i get the following error :

gpg -vvv --charset utf-8 --keyserver ldap://ldapserver -send-keys **
gpg: using character set `utf-8'
gpgkeys: error adding key
***** to keyserver: Invalid syntax

OpenLDAP log :
slapd[2256]: conn=10 op=3 RESULT tag=103 err=21 text=pgpUserID: value #0 invalid
per syntax

I checked the TCP packet and the "é" character is encoded as "e9" which seems to
be ISO 8859-1 encoding. OpenLDAP only accepts UTF-8 charset as far as i know.

Is there any way to get gnupg to encode special characters in UTF-8 ?

Details

Version
1.4.9

Event Timeline

What OS are you using? In general there is no need to use --charset; however
it is not related to the problem.

David, I guess the problem is simply that the user ID has a bad encoding (i.e.
created by PGP or by GnuPG with a wrong charset set). Do we need to add a
heuristic as done by several frontends to fix such bad encodings?

I'm using Debian GNU/Linux 5.0 and most of the PGP Keys have been created with
Enigmail (the Mozilla Thunderbird extension).

Not sure it it could help but when i display the key details it shows up this way :

gpg --list-keys **
uid Ren\xe9
<**@**.***>

With '\xe9' being the UTF-8 code for 'é'.

Could you attach a copy of the public key you're having a problem with to this
bug? If you don't want to reveal that key for whatever reason, could you
generate another one with the 'é' character that shows the same problem?

Ah, never mind. I found a key (ACCFFAE2) that nicely duplicates the problem.

Yes, it's an encoding problem. The key isn't UTF-8, and so the LDAP server is
rejecting it. I wonder if the easiest fix here would be to always do a
utf8_to_native and then native_to_utf8 call on the contents of the UID. Things
that are already UTF-8 will not be harmed, and things that aren't UTF-8 will be
properly escaped (with the \xe9 sort of syntax).

Does LDAP find these keys if they are C-style escaped? I guess that depends how
they have been put into the LDAP DB.

FWIW, In GPA we use this code which fixes the encoding problem:

/* Return the user ID, making sure it is properly UTF-8 encoded.

Allocates a new string, which must be freed with g_free ().  */

static gchar *
string_to_utf8 (const gchar *string)
{

const char *s;

if (!string)
  return NULL;

/* Due to a bug in old and not so old PGP versions user IDs have
   been copied verbatim into the key.  Thus many users with Umlauts
   et al. in their name will see their names garbled.  Although this
   is not an issue for me (;-)), I have a couple of friends with
   Umlauts in their name, so let's try to make their life easier by
   detecting invalid encodings and convert that to Latin-1.  We use
   this even for X.509 because it may make things even better given
   all the invalid encodings often found in X.509 certificates.  */
for (s = string; *s && !(*s & 0x80); s++)
  ;
if (*s && ((s[1] & 0xc0) == 0x80) && ( ((*s & 0xe0) == 0xc0)
                                       || ((*s & 0xf0) == 0xe0)
                                       || ((*s & 0xf8) == 0xf0)
                                       || ((*s & 0xfc) == 0xf8)
                                       || ((*s & 0xfe) == 0xfc)) )
  {  
    /* Possible utf-8 character followed by continuation byte.
       Although this might still be Latin-1 we better assume that it
       is valid utf-8. */
    return g_strdup (string);
   }
else if (*s && !strchr (string, 0xc3))
  {
    /* No 0xC3 character in the string; assume that it is Latin-1.  */
    return g_convert (string, -1, "UTF-8", "ISO-8859-1", NULL, NULL, NULL);
  }
else
  {
    /* Everything else is assumed to be UTF-8.  We do this even that
       we know the encoding is not valid.  However as we only test
       the first non-ascii character, valid encodings might
       follow.  */
    return g_strdup (string);
  }

}

It uses glib functions but it should be easy to change it to the GnuPG functions.

Good idea. I think an adaptation of that code will do nicely. I think what is
needed here is a pass through that code, which almost always returns UTF8, then
a pass through utf8_to_native and then native_to_utf8. This is a lot of
manipulation, but string_to_utf8 may not return UTF8 if the user ID is coded
very badly, and the LDAP server will reject anything that isn't UTF8

For true UTF8, this whole process should be lossless. For the common Latin-1
mis-encoding, it will become UTF8. For other encodings, it will become UTF8 but
with unknown characters quoted. This preserves the notion that anything in the
stream to the keyserver handler program is UTF8.

Did 1.4. Will do 2.0 shortly, after a bit of testing.

werner set Due Date to Sep 30 2011, 2:00 AM.Jul 1 2011, 12:26 PM

Still missing in 2.0.17; should we fix it at all in 2.0 or do it only in 2.1
(which has a lot of changes in the keyserver area).

We won't do it for 2.0 but should consider to do it in 2.1.
The 1.4 commit id is 00310b1aa868cc06cf486fcda6852e9750aa3564

We won't do that. Those with badly encoded user ids should create new keys or meanwhile have done so. The whole charset back and forth encoding adds a lot of complexity for some legacy applications. Frankly I would like to get rid of all code conversions and stick to utf-8.