Page MenuHome GnuPG

libgpg-error:w32: Support setting an environment block encoded as UTF-8
Open, NormalPublic

Description

For Windows, it is considered good that an environment block can be UTF-8 string for gpgrt_spawn_actions_set_envvars terminated by two zero bytes.
The internal function _gpgrt_process_spawn function converts it to wchar string terminated by four zero bytes to be an Unicode environment block,
and specify CREATE_UNICODE_ENVIRONMENT for cr_flags when calling CreateProcessW.

If a developer considered it's Unicode block when calling gpgrt_spawn_actions_set_envvars, it doesn't work since process is created with no CREATE_UNICODE_ENVIRONMENT currently. In this regard, it's a bug.

Event Timeline

gniibe triaged this task as Normal priority.Sep 6 2024, 4:06 AM
gniibe created this task.

Here is my attempt:

The problem might be that we use getenv all over the place and don't specify the content. Frankly, it is not 100% clear to me whether the value of an enbvar need to be a string or can be arbitrary data sans nul? However, I can't remember that I ever wrote any code which did not assume ascii or utf8 for the value.

See also the comments in gnupg_setenv() and a cpp warning somewhere that we should move to use only the W32 API environment functions instead of putenv/getenv.

String values are stored as UTF-16, but might not even contain a terminating doublezero since it can be any binary data. Note that on Windows the registry can be used to set environment variables. There "Edit binary data" shows exactly what is in the regkey. So if you use regedit with the String functions you can see that they are converted from latin1 to UTF-16.

I did a quick test and set a UTF-16 Homedir. It currently does not work:

So there seems to be little risk of regression either way.

I except that it currently expects UTF-8 but could not reproduce it with chcp or through the registy even if I entered the bytes correctly, as in below image, so please test any solution with a value that is not representable by the 8 bit encoding.

(The UTF-8 point for snowman is 0xE29883 )

Please note that gpgrt_spawn_actions_set_envvars is W32 specific API in libgpg-error. Currently, the behavior with ASCII string is defined.
The patch is an answer in future if we want to extend the semantics supporting UTF-8.

So far, existing code use case is only by scd-event where we supply GNUPGHOME.

The environment is a property of the C runtime and well defined as a block of concatenated C-strings terminated by a zero length C-string. In case of wmain the C-strings use wchar_t and not char.

I'm talking about CreateProcessW and how a user of gpgrt spawn API can specify lpEnvironment (when needed).

[in, optional] lpEnvironment

A pointer to the environment block for the new process. If this parameter is NULL, the new process uses the environment of the calling process.

An environment block consists of a null-terminated block of null-terminated strings. Each string is in the following form:

name=value\0

Because the equal sign is used as a separator, it must not be used in the name of an environment variable.

An environment block can contain either Unicode or ANSI characters. If the environment block pointed to by lpEnvironment contains Unicode characters, be sure that dwCreationFlags includes CREATE_UNICODE_ENVIRONMENT.

The ANSI version of this function, CreateProcessA fails if the total size of the environment block for the process exceeds 32,767 characters.

Note that an ANSI environment block is terminated by two zero bytes: one for the last string, one more to terminate the block. A Unicode environment block is terminated by four zero bytes: two for the last string, two more to terminate the block.

Since CreateProcessW allows two ways for lpEnvironment (one is ANSI environment block, another is Unicode environment block), if we want to support these two ways for users' of gpgrt spawn API, we would offer either:

  • Changing an argument of gpgrt_spawn_actions_set_envvars by two arguments of int is_ansi plus void *envblock

or

  • Extend the semantics as multibyte string (it's utf-8, or defined by the codepage), and convert it internally before calling CreateProcessW

I'd vote for the second (utf-8) which is more aligned with our other APIs.