Rasterio has different ways to access datasets located on disk or at network addresses and datasets located in memory buffers. This document explains the former once again and then introduces the latter for the first time.
To access datasets on disk, give a filesystem path to rasterio.open().
import rasterio
# Open a dataset located in a local file.
with rasterio.open('data/RGB.byte.tif') as dataset:
print(dataset.profile)
Equivalently, use a file:// URL.
with rasterio.open('file://data/RGB.byte.tif') as dataset:
print(dataset.profile)
To access a dataset located in a local zip file, pass a zip:// URL (Apache VFS style) to rasterio.open().
with rasterio.open('zip://data/files.zip!RGB.byte.tif') as dataset:
print(dataset.profile)
Datasets at http://, https://, or s3:// (AWS CLI style) network locations can be accessed by passing these locators to rasterio.open(). See #942 for details.
The difference from GDAL If you're a GDAL user, you may be used to passing strings like /vsizip/foo.zip to call for zip file handling and strings like /viscurl/https://example.com/foo.tif to call for HTTP protocol handling. Rasterio registers handlers by URL schemes instead. Rasterio uses GDAL's special strings internally, but they are not part of the Rasterio API.
Rasterio can access datasets located in the buffers of Python objects without writing the buffers to disk. To see, open and read any GeoTIFF file.
data = open('data/RGB.byte.tif', 'rb').read()
The buffer of data's value contains that GeoTIFF. To make it available to Rasterio (and GDAL), give data to a MemoryFile and then open the dataset using MemoryFile.open().
from rasterio.io import MemoryFile
with MemoryFile(data) as memfile:
with memfile.open() as dataset:
print(dataset.profile)
As there is only one dataset per MemoryFile, MemoryFile.open() needs no filename or path argument. In many cases the usage can be condensed to the following.
with MemoryFile(data).open() as dataset:
print(dataset.profile)
MemoryFile is like Python's BytesIO class but has an additional special feature: the bytes buffer is mapped to a virtual file for use by GDAL. The virtual file is deleted when the MemoryFile closes.
You can also pass a file-like object opened in binary mode to MemoryFile(). This is for convenience only, the bytes of the file are read immediately into a bytes object.
fp = open('data/RGB.byte.tif', 'rb')
with MemoryFile(fp).open() as dataset:
print(dataset.profile)
rgb_profile = dataset.profile
rgb_data = dataset.read()
Note that the profile and band data of that dataset have been captured for use in other examples below.
Recognize the above as a more memory-intensive way of getting the same results as the very first example in this document. Generally speaking, raster data formats are optimized for random access and GDAL format drivers need datasets to be written entirely onto disk or into memory and mapped to a virtual file. Using MemoryFile to hold a large GeoTIFF doesn't require a hard disk (which is good for serverless applications) but loads the entire GeoTIFF into RAM.
A MemoryFile can also be written. You can create a GeoTIFF (for example) in memory and then stream its bytes elsewhere without writing to disk. In this case you must bind the MemoryFile to a name so it can be referenced later.
with MemoryFile() as memfile:
with memfile.open(**rgb_profile) as dataset:
dataset.write(rgb_data)
memfile.seek(0)
print(memfile.read(1000))
Writing band data to the opened dataset modifies the virtual file and consequently the MemoryFile buffer.
Be kind: rewind Note well: after dataset closes, the memfile position is left at its end.
Zip files in a buffer The ZipMemoryFile class is mostly the same, but is for use with a buffer that contains a zip archive.
from rasterio.io import ZipMemoryFile
fp = open('data/files.zip', 'rb')
with ZipMemoryFile(fp) as zipmem:
with zipmem.open('RGB.byte.tif') as dataset:
print(dataset.profile)
This is much the same interface as that of zipfile.ZipFile.
Writing to a ZipMemoryFile is not currently supported, but it is possible to do so using Python's zipfile library and Rasterio's MemoryFile together.
from io import BytesIO
import zipfile
with BytesIO() as bytes_buffer:
with zipfile.ZipFile(bytes_buffer, 'w') as zf:
with MemoryFile() as memfile:
with memfile.open(**rgb_profile) as dataset:
dataset.write(rgb_data)
memfile.seek(0)
zf.writestr('foo.tif', memfile.read())
bytes_buffer.seek(0)
with ZipMemoryFile(bytes_buffer).open('foo.tif') as dataset:
print(dataset.profile)
By popular request, rasterio.open() can also take a file object opened in binary modes 'rb' or 'wb' as its first argument.
with open('data/RGB.byte.tif') as f:
with rasterio.open(f) as dataset:
print(dataset.profile)
A MemoryFile is created internally to hold the bytes read from the input file object. This is therefore not the best way to read or write datasets already on disk and addressable by name.
As is the case for every printed profile, the output is the following.
{'tiled': False, 'transform': Affine(300.0379266750948, 0.0, 101985.0,
0.0, -300.041782729805, 2826915.0), 'width': 791, 'dtype': 'uint8', 'interleave': 'pixel', 'driver': 'GTiff', 'crs': CRS({'init': 'epsg:32618'}), 'count': 3, 'height': 718, 'nodata': 0.0}
Rasterio has different ways to access datasets located on disk or at network addresses and datasets located in memory buffers. The features are acquired from GDAL, but the abstractions are different, more Pythonic.