Skip to content

Instantly share code, notes, and snippets.

@kgjenkins
Last active October 27, 2021 18:44
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save kgjenkins/f134cccf6827baf8c35949de07a3d233 to your computer and use it in GitHub Desktop.
Save kgjenkins/f134cccf6827baf8c35949de07a3d233 to your computer and use it in GitHub Desktop.
Working with GFED data in QGIS

Working with GFED data in QGIS

GFED (Global Fire Emissions Database) data is distributed in HDF5 format, which is commonly used for multidimensional datasets, but can sometimes be a challenge to use in QGIS for a variety of reasons. First, it doesn't have an explicitly-defined CRS that QGIS can simply use to display the data in the correct location. Second, there are many different subdatasets packed into a single GFED .hdf5 file, representing different variables, units, and time periods.

Add geoferencing info and convert HDF5 to GeoTIFF

First, we'll add georferencing information by assign a proper CRS definition to the data, and in the process save each subdataset to a separate file. This seems to be necessary because GDAL (which is part of the QGIS infrastructure) can read HDF but not write it, so it is not possible to simply assign the CRS info to the original HDF file. But it turns out that having separate files will actually make it possible to use certain QGIS processing tools later on.

Although QGIS uses GDAL behind the scenes, some things are easier to do on the command line, especially in cases where there are GDAL options that are not directly exposed to QGIS. If you have QGIS installed, then you should already have a working GDAL on your system. On Windows, open the Command Prompt (or if that doesn't work, look in your Start menu for the QGIS program folder > OSGEO4W Shell). On Mac, use the Terminal.

The following gdal_translate command will convert each all subdatasets in the HDF5 file to a separate, properly-georeferenced GeoTIFF file:

gdal_translate -sds -a_srs "EPSG:4326" -a_ullr -180 90 180 -90 GFED4.1s_2016.hdf5 gfed2016.tif
  • -sds = copy all subdatasets (bands)
  • -a_srs = assign the output CRS
  • -a_ullr = assign the upper left and lower right corners of the data extent

The output file names will be like gfed2016_123.tif, where '123' is the subdataset number.

Finding out which subdatasets you need

The GFED .hdf5 files contain hundreds of subdatasets, combining different variables and data types, and not always in a logical order -- for example, day numbers are sorted alphabetically (1, 10, 11, ... 18, 19, 2, 20, 21)

WARNING: the number and order of subdatasets may vary from year to year!

We can use the gdalinfo command to get a list of all the subdatasets in a .hdf5 file and save to a text file:

gdalinfo GFED4.1s_2016.hdf5 > info2016.txt

The output has two major sections. The first section, "Metadata", gives definitions and units of the different subdataset names. In this example, "01" refers to monthly data for January:

  biosphere_01_BB_long_name=Biomass burning carbon emissions based on the CASA-GFED4s framework
  biosphere_01_BB_units=g C / m^2 / month

Starting halfway through the file is the second section, "Subdatasets", which are listed in numeric order, and look something like this:

  SUBDATASET_3_NAME=HDF5:"GFED4.1s_2016.hdf5"://biosphere/01/BB
  SUBDATASET_3_DESC=[720x1440] //biosphere/01/BB (32-bit floating-point)

It may be helpful to do some computations on just a set of selected subdatasets, and save the result into a new GeoTIFF. For example, to calculate the annual sum of monthly Biomass burning carbon emissions (subdatasets 3, 6, 9, 12, 15, ... 33, 36), we can use the QGIS processing tool "Cell statistics":

  • Input Layers: click the '...' and "Add Files" and select 2016_003.tif, 2016_006.tif, 2016_009.tif, ... 2016_036.tif, then click "OK" (not "Run"!)
  • Statistic: Sum (unless you want some other statistic)
  • Reference Layer: select any of you input layers (doesn't matter which, since they all have the same cell size and spatial extent)
  • Output Layer: click the '...' to specify where to save the ouput. Call it something useful, like 'BB_sum_2016.tif'

It's probably a good idea to double-check your input layers, as one wrong selection could throw off the entire calculation. (This is where a scripted approach would be better... writing code rather than relying on a GUI.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment