rasterio.Env.options
does not accurately represent the current state
of the environemnt:
import rasterio as rio
with rio.Env(option='whatever') as env:
print("env.options", env.options, )
print("")
print("rio.env._env.options", rio.env._env.options)
print("")
print("rio.env.getenv()", rio.env.getenv())
print("")
print("Calling setenv()")
rio.env.setenv(another='???')
print("")
print("env.options", env.options)
print("")
print("rio.env._env.options", rio.env._env.options)
print("")
print("rio.env.getenv()", rio.env.getenv)
# Output
env.options {'option': 'whatever'}
rio.env._env.options {'CHECK_WITH_INVERT_PROJ': True, 'GTIFF_IMPLICIT_JPEG_OVR': False, "I'M_ON_RASTERIO": True, 'option': 'whatever'}
rio.env.getenv() {'CHECK_WITH_INVERT_PROJ': True, 'GTIFF_IMPLICIT_JPEG_OVR': False, "I'M_ON_RASTERIO": True, 'option': 'whatever'}
Calling setenv()
env.options {'option': 'whatever'}
rio.env._env.options {'CHECK_WITH_INVERT_PROJ': True, 'GTIFF_IMPLICIT_JPEG_OVR': False, "I'M_ON_RASTERIO": True, 'option': 'whatever', 'another': '???'}
rio.env.getenv() <function getenv at 0x108255e18>
The defenv()
, setenv()
, delenv()
, getenv()
, Env()
, GDALEnv()
,
and ConfigEnv()
combo is a leaky abstraction that can be simplified. I
I think the best solution is to require users to interact with the GDAL
environment through the front door that is rasterio.Env()
and eliminate
the side doors. ConfigEnv()
doesn't add a lot on its own and can be merged
with GDALEnv()
.
import rasterio as rio
def func_set():
print("In func_set()")
print("rio.env.setenv(option='value')")
print("")
rio.env.setenv(option='value')
def func_del():
print("In func_del()")
print("rio.env.delenv()")
rio.env.delenv()
print("")
with rio.Env(env_option=True) as env:
print("")
print("In Env()")
print("rio.env.getenv():", rio.env.getenv())
func_set()
print("Back in Env()")
print("rio.env.getenv():", rio.env.getenv())
print("")
func_del()
print("Back in Env()")
print("rio.env.getenv():", rio.env.getenv())
# Output
In Env()
rio.env.getenv(): {'CHECK_WITH_INVERT_PROJ': True, 'GTIFF_IMPLICIT_JPEG_OVR': False, "I'M_ON_RASTERIO": True, 'env_option': True}
In func_set()
rio.env.setenv(option='value')
Back in Env()
rio.env.getenv(): {'CHECK_WITH_INVERT_PROJ': True, 'GTIFF_IMPLICIT_JPEG_OVR': False, "I'M_ON_RASTERIO": True, 'env_option': True, 'option': 'value'}
In func_del()
rio.env.delenv()
Back in Env()
rio.env.getenv(): {}
The mysteriousness surrounding the GDAL environment begs another question:
should users be allowed to edit the current environment directly? It's more
GDAL-y but there are plenty of opportunities for weirdness and gotchas.
Maybe users should be required to instantiate another rasterio.Env()
if
they want to modify the environment? This could produce a lot of nested
with
blocks, but how deep do they really need to go? Along the same lines,
should it be possible to instantiate rasterio.Env()
outside of the normal
context manager use case? The with
is what really makes rasterio.Env()
do something. For example:
import rasterio
print(rasterio.Env().drivers())
# Output
Traceback (most recent call last):
File "buh.py", line 4, in <module>
print(rio.Env().drivers())
File "/Users/kevin.wurster/code/rasterio/rasterio/env.py", line 141, in drivers
return _env.drivers()
AttributeError: 'NoneType' object has no attribute 'drivers'
The defaults are always present. The only way to disable is:
import rasterio as rio
with rio.Env(**{k: not v for k, v in rio.env.default_options.items()}) as env:
pass
I propose:
import rasterio as rio
with rio.Env.from_defaults() as env:
pass
<shudder>
Here's the grim reality: anyone migrating a large osgeo.gdal
code base to Rasterio can't do it all at once, so the two must be able to
coexist for the benefit of our power users. Neither project should encourage
this, but with some documentation and a few code changes on our end we can
make it easier. I learned a lot about the potential environment pitfalls
while investigating this issue and I have some code snippets illustrating
various gotchas that will be useful. I will add them in another PR.
This should help a lot: https://trac.osgeo.org/gdal/changeset/37273.
There are a lot of rasterio.Env()
instances in the tests that are no longer needed after introducing @ensure_env
.
Remove Env()
from rasterio.open()
and wrap with a ensure_env()
.
with rio.Env(option='value'):
with rio.Env() as env:
print(get_gdal_config('option'))
print(get_gdal_config('option'))
Can we assume that None
will forever be a safe way to unset an environment variable?
Rasterio's testing should only instantiate a rasterio.Env()
when testing environemnt related stuff. Rely on @ensure_env
otherwise.
Tests can easily leave variables in the GDAL environment by accident. Can we use a pytest fixture to scrub the environment every time? Is this even possible prior to the CPL API addition? SHould we even be worried about it?
GDAL automatically handles casting config option keys from lower to upper case. How does the CPL API handle this?
from osgeo import gdal
gdal.SetConfigValue('foo', 'bar')
print(gdal.GetConfigOption('FOO', 'bar'))
Rasterio should be casting ON
and TRUE
to Python's True
and OFF
and FALSE
to Python's False
. Check the docs to see what else it does.
What happens with threads and processes?
. The GDAL environment is a mysterious vortex but Rasterio
makes it easier to use. These three functions let users bypass
rasterio.Env()
and interact directly with the GDAL environment, and have
the same mystifying effect as a hidden osgeo.gdal.SetConfigOption()
.
All of Rasterio depends on rasterio.Env()
accurately representing the
current state of the GDAL environment.
-=-=-=-=-=-=-=-=-=-=-=-=-=-
It's currently much better than osgeo.gdal
, but could use a bit of improvement.
The side effects of exposing get_gdal_config()
, set_gdal_config()
, and del_gdal_config()
in the public API are identical to the side effects associated with using osgeo.SetConfigOption()
in the same namespace, or when just using the osgeo
bindings, using osgeo.SetConfigOption()
deep in the call stack.
Obviously these functions are required for interacting with GDAL, but I don't think they should be exposed in the public API.
import rasterio as rio
from rasterio.env import del_gdal_config, get_gdal_config, set_gdal_config
print("")
print("Setting")
print("rio.env._env: ", rio.env._env)
set_gdal_config('option', 'whatever')
print("rio.env._env: ", rio.env._env)
print("")
print("Getting")
print(get_gdal_config('option'))
print("")
print("Deleting")
del_gdal_config('option')
print("rio.env._env:", rio.env._env)
# Output
Setting
rio.env._env: None
rio.env._env: None
Getting
whatever
Deleting
rio.env._env: None
rasterio.Env.options
does not accurately represent the current state
of the environemnt:
import rasterio as rio
with rio.Env(option='whatever') as env:
print("env.options", env.options, )
print("")
print("rio.env._env.options", rio.env._env.options)
print("")
print("rio.env.getenv()", rio.env.getenv())
print("")
print("Calling setenv()")
rio.env.setenv(another='???')
print("")
print("env.options", env.options)
print("")
print("rio.env._env.options", rio.env._env.options)
print("")
print("rio.env.getenv()", rio.env.getenv)
# Output
env.options {'option': 'whatever'}
rio.env._env.options {'CHECK_WITH_INVERT_PROJ': True, 'GTIFF_IMPLICIT_JPEG_OVR': False, "I'M_ON_RASTERIO": True, 'option': 'whatever'}
rio.env.getenv() {'CHECK_WITH_INVERT_PROJ': True, 'GTIFF_IMPLICIT_JPEG_OVR': False, "I'M_ON_RASTERIO": True, 'option': 'whatever'}
Calling setenv()
env.options {'option': 'whatever'}
rio.env._env.options {'CHECK_WITH_INVERT_PROJ': True, 'GTIFF_IMPLICIT_JPEG_OVR': False, "I'M_ON_RASTERIO": True, 'option': 'whatever', 'another': '???'}
rio.env.getenv() <function getenv at 0x108255e18>
The defenv()
, setenv()
, delenv()
, getenv()
, Env()
, GDALEnv()
,
and ConfigEnv()
combo is a leaky abstraction that can be simplified. I
I think the best solution is to require users to interact with the GDAL
environment through the front door that is rasterio.Env()
and eliminate
the side doors. ConfigEnv()
doesn't add a lot on its own and can be merged
with GDALEnv()
.
import rasterio as rio
def func_set():
print("In func_set()")
print("rio.env.setenv(option='value')")
print("")
rio.env.setenv(option='value')
def func_del():
print("In func_del()")
print("rio.env.delenv()")
rio.env.delenv()
print("")
with rio.Env(env_option=True) as env:
print("")
print("In Env()")
print("rio.env.getenv():", rio.env.getenv())
func_set()
print("Back in Env()")
print("rio.env.getenv():", rio.env.getenv())
print("")
func_del()
print("Back in Env()")
print("rio.env.getenv():", rio.env.getenv())
# Output
In Env()
rio.env.getenv(): {'CHECK_WITH_INVERT_PROJ': True, 'GTIFF_IMPLICIT_JPEG_OVR': False, "I'M_ON_RASTERIO": True, 'env_option': True}
In func_set()
rio.env.setenv(option='value')
Back in Env()
rio.env.getenv(): {'CHECK_WITH_INVERT_PROJ': True, 'GTIFF_IMPLICIT_JPEG_OVR': False, "I'M_ON_RASTERIO": True, 'env_option': True, 'option': 'value'}
In func_del()
rio.env.delenv()
Back in Env()
rio.env.getenv(): {}
The mysteriousness surrounding the GDAL environment begs another question:
should users be allowed to edit the current environment directly? It's more
GDAL-y but there are plenty of opportunities for weirdness and gotchas.
Maybe users should be required to instantiate another rasterio.Env()
if
they want to modify the environment? This could produce a lot of nested
with
blocks, but how deep do they really need to go? Along the same lines,
should it be possible to instantiate rasterio.Env()
outside of the normal
context manager use case? The with
is what really makes rasterio.Env()
do something. For example:
import rasterio
print(rasterio.Env().drivers())
# Output
Traceback (most recent call last):
File "buh.py", line 4, in <module>
print(rio.Env().drivers())
File "/Users/kevin.wurster/code/rasterio/rasterio/env.py", line 141, in drivers
return _env.drivers()
AttributeError: 'NoneType' object has no attribute 'drivers'
The defaults are always present. The only way to disable is:
import rasterio as rio
with rio.Env(**{k: not v for k, v in rio.env.default_options.items()}) as env:
pass
I propose:
import rasterio as rio
with rio.Env.from_defaults() as env:
pass
<shudder>
Here's the grim reality: anyone migrating a large osgeo.gdal
code base to Rasterio can't do it all at once, so the two must be able to
coexist for the benefit of our power users. Neither project should encourage
this, but with some documentation and a few code changes on our end we can
make it easier. I learned a lot about the potential environment pitfalls
while investigating this issue and I have some code snippets illustrating
various gotchas that will be useful. I will add them in another PR.
This should help a lot: https://trac.osgeo.org/gdal/changeset/37273.