Skip to content

Instantly share code, notes, and snippets.

@geowurster
Last active November 13, 2017 01:52
Show Gist options
  • Save geowurster/99283743379436a0abc464f7cdd9acca to your computer and use it in GitHub Desktop.
Save geowurster/99283743379436a0abc464f7cdd9acca to your computer and use it in GitHub Desktop.
Improving Rasterio's Management of the GDAL Environment

rasterio.Env.options does not accurately represent the current state of the environemnt:

import rasterio as rio

with rio.Env(option='whatever') as env:
    print("env.options", env.options, )
    print("")
    print("rio.env._env.options", rio.env._env.options)
    print("")
    print("rio.env.getenv()", rio.env.getenv())
    print("")
    print("Calling setenv()")
    rio.env.setenv(another='???')
    print("")
    print("env.options", env.options)
    print("")
    print("rio.env._env.options", rio.env._env.options)
    print("")
    print("rio.env.getenv()", rio.env.getenv)

# Output

env.options {'option': 'whatever'}

rio.env._env.options {'CHECK_WITH_INVERT_PROJ': True, 'GTIFF_IMPLICIT_JPEG_OVR': False, "I'M_ON_RASTERIO": True, 'option': 'whatever'}

rio.env.getenv() {'CHECK_WITH_INVERT_PROJ': True, 'GTIFF_IMPLICIT_JPEG_OVR': False, "I'M_ON_RASTERIO": True, 'option': 'whatever'}

Calling setenv()

env.options {'option': 'whatever'}

rio.env._env.options {'CHECK_WITH_INVERT_PROJ': True, 'GTIFF_IMPLICIT_JPEG_OVR': False, "I'M_ON_RASTERIO": True, 'option': 'whatever', 'another': '???'}

rio.env.getenv() <function getenv at 0x108255e18>

The defenv(), setenv(), delenv(), getenv(), Env(), GDALEnv(), and ConfigEnv() combo is a leaky abstraction that can be simplified. I I think the best solution is to require users to interact with the GDAL environment through the front door that is rasterio.Env() and eliminate the side doors. ConfigEnv() doesn't add a lot on its own and can be merged with GDALEnv().

import rasterio as rio

def func_set():
    print("In func_set()")
    print("rio.env.setenv(option='value')")
    print("")
    rio.env.setenv(option='value')

def func_del():
    print("In func_del()")
    print("rio.env.delenv()")
    rio.env.delenv()
    print("")

with rio.Env(env_option=True) as env:
    print("")
    print("In Env()")
    print("rio.env.getenv():", rio.env.getenv())
    func_set()
    print("Back in Env()")
    print("rio.env.getenv():", rio.env.getenv())
    print("")
    func_del()
    print("Back in Env()")
    print("rio.env.getenv():", rio.env.getenv())

# Output

In Env()
rio.env.getenv(): {'CHECK_WITH_INVERT_PROJ': True, 'GTIFF_IMPLICIT_JPEG_OVR': False, "I'M_ON_RASTERIO": True, 'env_option': True}
In func_set()
rio.env.setenv(option='value')

Back in Env()
rio.env.getenv(): {'CHECK_WITH_INVERT_PROJ': True, 'GTIFF_IMPLICIT_JPEG_OVR': False, "I'M_ON_RASTERIO": True, 'env_option': True, 'option': 'value'}

In func_del()
rio.env.delenv()

Back in Env()
rio.env.getenv(): {}

The mysteriousness surrounding the GDAL environment begs another question: should users be allowed to edit the current environment directly? It's more GDAL-y but there are plenty of opportunities for weirdness and gotchas. Maybe users should be required to instantiate another rasterio.Env() if they want to modify the environment? This could produce a lot of nested with blocks, but how deep do they really need to go? Along the same lines, should it be possible to instantiate rasterio.Env() outside of the normal context manager use case? The with is what really makes rasterio.Env() do something. For example:

import rasterio

print(rasterio.Env().drivers())


# Output

Traceback (most recent call last):
  File "buh.py", line 4, in <module>
    print(rio.Env().drivers())
  File "/Users/kevin.wurster/code/rasterio/rasterio/env.py", line 141, in drivers
    return _env.drivers()
AttributeError: 'NoneType' object has no attribute 'drivers'

The defaults are always present. The only way to disable is:

import rasterio as rio

with rio.Env(**{k: not v for k, v in rio.env.default_options.items()}) as env:
    pass

I propose:

import rasterio as rio

with rio.Env.from_defaults() as env:
    pass

Rasterio + osgeo.gdal

<shudder> Here's the grim reality: anyone migrating a large osgeo.gdal code base to Rasterio can't do it all at once, so the two must be able to coexist for the benefit of our power users. Neither project should encourage this, but with some documentation and a few code changes on our end we can make it easier. I learned a lot about the potential environment pitfalls while investigating this issue and I have some code snippets illustrating various gotchas that will be useful. I will add them in another PR.

This should help a lot: https://trac.osgeo.org/gdal/changeset/37273.

There are a lot of rasterio.Env() instances in the tests that are no longer needed after introducing @ensure_env.

Remove Env() from rasterio.open() and wrap with a ensure_env().

with rio.Env(option='value'):
    with rio.Env() as env:
        print(get_gdal_config('option'))
    print(get_gdal_config('option'))

Can we assume that None will forever be a safe way to unset an environment variable?

Rasterio's testing should only instantiate a rasterio.Env() when testing environemnt related stuff. Rely on @ensure_env otherwise.

Tests can easily leave variables in the GDAL environment by accident. Can we use a pytest fixture to scrub the environment every time? Is this even possible prior to the CPL API addition? SHould we even be worried about it?

GDAL automatically handles casting config option keys from lower to upper case. How does the CPL API handle this?

from osgeo import gdal

gdal.SetConfigValue('foo', 'bar')
print(gdal.GetConfigOption('FOO', 'bar'))

Rasterio should be casting ON and TRUE to Python's True and OFF and FALSE to Python's False. Check the docs to see what else it does.

What happens with threads and processes?


The rest of this is re-organizing the above as a GitHub ticket.

. The GDAL environment is a mysterious vortex but Rasterio makes it easier to use. These three functions let users bypass rasterio.Env() and interact directly with the GDAL environment, and have the same mystifying effect as a hidden osgeo.gdal.SetConfigOption(). All of Rasterio depends on rasterio.Env() accurately representing the current state of the GDAL environment.

-=-=-=-=-=-=-=-=-=-=-=-=-=-

It's currently much better than osgeo.gdal, but could use a bit of improvement.

Side doors

The side effects of exposing get_gdal_config(), set_gdal_config(), and del_gdal_config() in the public API are identical to the side effects associated with using osgeo.SetConfigOption() in the same namespace, or when just using the osgeo bindings, using osgeo.SetConfigOption() deep in the call stack.

Obviously these functions are required for interacting with GDAL, but I don't think they should be exposed in the public API.

import rasterio as rio
from rasterio.env import del_gdal_config, get_gdal_config, set_gdal_config

print("")
print("Setting")
print("rio.env._env: ", rio.env._env)
set_gdal_config('option', 'whatever')
print("rio.env._env: ", rio.env._env)

print("")
print("Getting")
print(get_gdal_config('option'))

print("")
print("Deleting")
del_gdal_config('option')
print("rio.env._env:", rio.env._env)

# Output

Setting
rio.env._env: None
rio.env._env: None

Getting
whatever

Deleting
rio.env._env: None

Accurately representing the environment

rasterio.Env.options does not accurately represent the current state of the environemnt:

import rasterio as rio

with rio.Env(option='whatever') as env:
    print("env.options", env.options, )
    print("")
    print("rio.env._env.options", rio.env._env.options)
    print("")
    print("rio.env.getenv()", rio.env.getenv())
    print("")
    print("Calling setenv()")
    rio.env.setenv(another='???')
    print("")
    print("env.options", env.options)
    print("")
    print("rio.env._env.options", rio.env._env.options)
    print("")
    print("rio.env.getenv()", rio.env.getenv)

# Output

env.options {'option': 'whatever'}

rio.env._env.options {'CHECK_WITH_INVERT_PROJ': True, 'GTIFF_IMPLICIT_JPEG_OVR': False, "I'M_ON_RASTERIO": True, 'option': 'whatever'}

rio.env.getenv() {'CHECK_WITH_INVERT_PROJ': True, 'GTIFF_IMPLICIT_JPEG_OVR': False, "I'M_ON_RASTERIO": True, 'option': 'whatever'}

Calling setenv()

env.options {'option': 'whatever'}

rio.env._env.options {'CHECK_WITH_INVERT_PROJ': True, 'GTIFF_IMPLICIT_JPEG_OVR': False, "I'M_ON_RASTERIO": True, 'option': 'whatever', 'another': '???'}

rio.env.getenv() <function getenv at 0x108255e18>

The defenv(), setenv(), delenv(), getenv(), Env(), GDALEnv(), and ConfigEnv() combo is a leaky abstraction that can be simplified. I I think the best solution is to require users to interact with the GDAL environment through the front door that is rasterio.Env() and eliminate the side doors. ConfigEnv() doesn't add a lot on its own and can be merged with GDALEnv().

import rasterio as rio

def func_set():
    print("In func_set()")
    print("rio.env.setenv(option='value')")
    print("")
    rio.env.setenv(option='value')

def func_del():
    print("In func_del()")
    print("rio.env.delenv()")
    rio.env.delenv()
    print("")

with rio.Env(env_option=True) as env:
    print("")
    print("In Env()")
    print("rio.env.getenv():", rio.env.getenv())
    func_set()
    print("Back in Env()")
    print("rio.env.getenv():", rio.env.getenv())
    print("")
    func_del()
    print("Back in Env()")
    print("rio.env.getenv():", rio.env.getenv())

# Output

In Env()
rio.env.getenv(): {'CHECK_WITH_INVERT_PROJ': True, 'GTIFF_IMPLICIT_JPEG_OVR': False, "I'M_ON_RASTERIO": True, 'env_option': True}
In func_set()
rio.env.setenv(option='value')

Back in Env()
rio.env.getenv(): {'CHECK_WITH_INVERT_PROJ': True, 'GTIFF_IMPLICIT_JPEG_OVR': False, "I'M_ON_RASTERIO": True, 'env_option': True, 'option': 'value'}

In func_del()
rio.env.delenv()

Back in Env()
rio.env.getenv(): {}

The mysteriousness surrounding the GDAL environment begs another question: should users be allowed to edit the current environment directly? It's more GDAL-y but there are plenty of opportunities for weirdness and gotchas. Maybe users should be required to instantiate another rasterio.Env() if they want to modify the environment? This could produce a lot of nested with blocks, but how deep do they really need to go? Along the same lines, should it be possible to instantiate rasterio.Env() outside of the normal context manager use case? The with is what really makes rasterio.Env() do something. For example:

import rasterio

print(rasterio.Env().drivers())


# Output

Traceback (most recent call last):
  File "buh.py", line 4, in <module>
    print(rio.Env().drivers())
  File "/Users/kevin.wurster/code/rasterio/rasterio/env.py", line 141, in drivers
    return _env.drivers()
AttributeError: 'NoneType' object has no attribute 'drivers'

The defaults are always present. The only way to disable is:

import rasterio as rio

with rio.Env(**{k: not v for k, v in rio.env.default_options.items()}) as env:
    pass

I propose:

import rasterio as rio

with rio.Env.from_defaults() as env:
    pass

Rasterio + osgeo.gdal

<shudder> Here's the grim reality: anyone migrating a large osgeo.gdal code base to Rasterio can't do it all at once, so the two must be able to coexist for the benefit of our power users. Neither project should encourage this, but with some documentation and a few code changes on our end we can make it easier. I learned a lot about the potential environment pitfalls while investigating this issue and I have some code snippets illustrating various gotchas that will be useful. I will add them in another PR.

This should help a lot: https://trac.osgeo.org/gdal/changeset/37273.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment