Some background explanation to this research:
We are using the filesystem encoding for the environment variables on Python 2 to retain existing behaviour. Python 3 is better.
However, this doesn't actually work properly for characters that can't be encoded with the filesystemencoding. We don't tend to have any of these - our PATHs are generally program files or environment directories. Python 3 handles this better by using the Windows Unicode APIs (suffixed with W) for environment and subprocess calls Python 2 does this for general path manipulation if you use the text / unicode type. If it's necessary to emulate Python 3 behaviour, it can be done with ctypes (see http://stackoverflow.com/a/2608368):
def get_env(name):
n = ctypes.windll.kernel32.GetEnvironmentVariableW(name, None, 0)
if n == 0:
return None
buf = ctypes.create_unicode_buffer(u'\0'*n)
ctypes.windll.kernel32.GetEnvironmentVariableW(name, buf, n)
return buf.value
def set_env(name, value):
return ctypes.windll.kernel32.SetEnvironmentVariableW(name, value)
def get_env_dict():
key_list = ctypes.windll.kernel32.GetEnvironmentStringsW()
environ = {}
while True:
env_def = ctypes.wstring_at(key_list)
if env_def == '':
break
name, value = env_def.split(u'=', 1)
environ[name] = value
key_list += (len(env_def)+1)*2
return environ
However, this doesn't help, because python 2.7's subprocess module uses the non-unicode CreateProcess[A], which can't handle unicode in its env and there aren't encodings that work. See http://stackoverflow.com/a/10360838. If something really needs to be done, the above methods can be used to set in-process environment variables which will be passed to the subprocess by default (or we could use a native Windows POpen like in processfamily)
Other related info: Windows batch files depend on the code page; chcp 65001 will set that locally to UTF-8, and cmd /U has an effect too Java has a command-line argument -Dfile.encoding=UTF-8 that does the same With Python 3 or Python 2 and the above ctypes approach, they can all interact sanely