Skip to content

Instantly share code, notes, and snippets.

@unusorin
Forked from palmerj/rgb_bigtiff_cogs_notes.md
Created March 30, 2022 14:57
Show Gist options
  • Save unusorin/07d04d969b03c70ca101006cb249b7b4 to your computer and use it in GitHub Desktop.
Save unusorin/07d04d969b03c70ca101006cb249b7b4 to your computer and use it in GitHub Desktop.
Creating BigTiff COGS for raster RGB photos from a tile mosaic directory using GDAL

Creating a Cloud Optimised Geotiffs (COGs) for raster photo imagery

This process outlines the process for creating Cloud Optimised Geotiffs suitable for hosting in services such as AWS S3. COGs enables more efficient workflows use cases such as fast access from Functions as a Services (E.g AWS Lambda), or comsumption into client desktop GIS systems (e.g QGIS). For more details on COGs please see https://www.cogeo.org/in-depth.html

1. Create a mosaic

First create the virtual mosaic from the directory of tiles, ensuring that a alpha band is created in the VRT to set transparency where there is no source raster.

gdalbuildvrt -addalpha mosaic.vrt *.tif
gdal_translate -b 1 -b 2 -b 3 -mask 4 mosaic.vrt rgbmask.vrt

2. Create a BigTiff

Create a BigTiff in a lossless compression to avoid quality loss. Use all available CPU cores (DEFLATE compression method can use multi-threading). The GeoTiff has an internal 1-bit mask band to provide transparency for parts of the mosaic raster extent that contain no source data

gdal_translate \
  -b 1 -b 2 -b 3 -mask 4 \
  -of GTiff \
  -co BIGTIFF=YES \
  -co TILED=YES \
  -co COMPRESS=DEFLATE \
  -co PREDICTOR=2 \
  -co NUM_THREADS=ALL_CPUS \
  --config GDAL_CACHEMAX 4096 \
  -co ALPHA=YES \
  --config GDAL_TIFF_INTERNAL_MASK YES \
  mosaic.vrt output.tif

3. Create Overviews

Create overviews for the mosaic.

Note: For the gdaladdo there is known issue that generating multiple overviews in the same TIFF file is slow and causes tiff directory thrashing. The libtiff library has to go back-and-forth between multiple TIFF internal images, and load/unload the TIFF indexes each time. For a huge file, this involes a lot of I/O. The workaround, which is especially fine for the COG case, is to generate each overview level in its own file by cascading calls to gdaladdo. See https://trac.osgeo.org/gdal/ticket/5067#comment:2 for more info


OVERVIEW=output.tif
for VARIABLE in 2 4 8 16 32 64 128 256 512
do
  gdaladdo \
    --config GDAL_CACHEMAX 4096 \
    --config COMPRESS_OVERVIEW DEFLATE \
    -ro \
    -r average \
    $OVERVIEW 2
  OVERVIEW = ${OVERVIEW}.ovr
done

4. Create Cloud Optimised Geotiff (COGS)

Create COGs, applying final JPEG compression, and copying and compressing the previously generated overview's IFD (Image File Directory) index in the header of the file to be efficiently fetchable via cloud web APIs. The GeoTiff is creates internal tiles of 256x256 for the main resolution and 128x128 tiles for overviews

NOTES:

  • When compressing with JPEG multi-threading can not be used.
  • Increasing the block size can reduce the size of the IFD. But larger blocks can cause more bytes to be pulled for random access if the compression rate is not high. Going from teh default of 256 to 512 will reduce the index by a factor of 4. The size of the TIFF index arrays, for each pyramid level, is : 2 * ceil(xsize / blockxsize) * ceil(ysize / blockysize) * 8 bytes Because we use an internal mask, this value has to be multiplied by 2.
gdal_translate \
  -of GTiff \
  -co BIGTIFF=YES \
  -co TILED=YES \
  -co BLOCKXSIZE=256 \
  -co BLOCKYSIZE=256 \
  -co COMPRESS=JPEG \
  -co JPEG_QUALITY=85 \
  -co PHOTOMETRIC=YCBCR \
  -co COPY_SRC_OVERVIEWS=YES \
  -co ALPHA=YES \
  --config GDAL_TIFF_INTERNAL_MASK YES \
  --config GDAL_TIFF_OVR_BLOCKSIZE 128 \
  --config GDAL_CACHEMAX 4096 \
  output.tif output_cogs.tif

5. Validate the COGs Geotiff

Check there are no errors or warnings from the following script

python validate_cloud_optimized_geotiff.py output_cogs.tif
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment