Skip to content

Instantly share code, notes, and snippets.

@sterlingwes
Last active July 1, 2023 14:14
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save sterlingwes/c074857ba8320c7c20673945d605e757 to your computer and use it in GitHub Desktop.
Save sterlingwes/c074857ba8320c7c20673945d605e757 to your computer and use it in GitHub Desktop.
Reading specific data from a NetCDF file

My Goal

I have a NetCDF I downloaded from firesmoke.ca (Bluesky Canadian Wildfire Smoke Forecasting System) and I wanted to retrieve the PM 2.5 value fo a given geo coordinate or nearest lat / lon.

Exploration Notes

The NetCDF file can't be opened as a textfile as it's a binary format but there's a few tools I used to inspect the data model inside:

ncdump -c myfile.nc gave me the following:

netcdf dispersion {
dimensions:
	TSTEP = UNLIMITED ; // (51 currently)
	DATE-TIME = 2 ;
	LAY = 1 ;
	VAR = 1 ;
	ROW = 381 ;
	COL = 1081 ;
variables:
	int TFLAG(TSTEP, VAR, DATE-TIME) ;
		TFLAG:units = "<YYYYDDD,HHMMSS>" ;
		TFLAG:long_name = "TFLAG           " ;
		TFLAG:var_desc = "Timestep-valid flags:  (1) YYYYDDD or (2) HHMMSS                                " ;
	float PM25(TSTEP, LAY, ROW, COL) ;
		PM25:long_name = "PM25            " ;
		PM25:units = "ug/m^3          " ;
		PM25:var_desc = "PM25                                                                            " ;

// global attributes:
		:IOAPI_VERSION = "$Id: @(#) ioapi library version 3.0 $                                           " ;
		:EXEC_ID = "????????????????                                                                " ;
		:FTYPE = 1 ;
		:CDATE = 2023182 ;
		:CTIME = 45032 ;
		:WDATE = 2023182 ;
		:WTIME = 45032 ;
		:SDATE = 2023182 ;
		:STIME = 30000 ;
		:TSTEP = 10000 ;
		:NTHIK = 1 ;
		:NCOLS = 1081 ;
		:NROWS = 381 ;
		:NLAYS = 1 ;
		:NVARS = 1 ;
		:GDTYP = 1 ;
		:P_ALP = 0. ;
		:P_BET = 0. ;
		:P_GAM = 0. ;
		:XCENT = -106. ;
		:YCENT = 51. ;
		:XORIG = -160. ;
		:YORIG = 32. ;
		:XCELL = 0.100000001490116 ;
		:YCELL = 0.100000001490116 ;
		:VGTYP = 5 ;
		:VGTOP = -9999.f ;
		:VGLVLS = 10.f, 0.f ;
		:GDNAM = "HYSPLIT CONC    " ;
		:UPNAM = "hysplit2netCDF  " ;
		:VAR-LIST = "PM25            " ;
		:FILEDESC = "Hysplit Concentration Model Output                                              lat-lon coordinate system...";

The float PM25(TSTEP, LAY, ROW, COL) ; part seemed immediately interesting to me, and its function-style notation made me think I could invoke it with the parameters.

Opening Panoply and trying to create a plot for this variable showed a box on the map roughly the area of the data domain I already knew this file covered. Switching to the Array tab of the plot gave more useful indication of the structure & format of the data in the PM25 "variable".

I noted that ncdump didn't have a way to "invoke" a variable with various arguments, but some searching led me to ncks which does:

ncks -v PM25 -d TSTEP,0,50 -d LAY,0 -d ROW,150 -d COL,500 myfile.nc

Based on the ncdump output I could see the TSTEP dimension had 51 values, and ROW had 381 and COL had 1081 so the command above prints out all of the PM2.5 values for coord 150,500 on the geo grid for all time steps on the only "layer".

Next step was figuring out how to pick the right ROW / COL for the lat / lon I'm interested in.

I noticed in Panoply when clicking the lat variable that it indicates it is a "virtual variable" with the description "synthesized coordinate from YORIG YCELL global attributes". I remember seeing YORIG in the ncdump output above.

Quick math shows that YORIG + YCELL * MAX(ROW) = MAX(LAT) or 32 + 0.10 * 381 = 70.1 (verified the max lat in the Panoply plot for the lat variable in the array tab). So I can pick the closest row or column based on this math.

While I can also probably determine the date / time for the given TSTEP using this method and the corresponding attributes (Panoply notes that the time virtual variable is computed as "synthesized time coordinate from SDATE, STIME, STEP global attributes"), it seems the TFLAG variable is easier to use for this:

ncks -v TFLAG -d TSTEP,0,50 -d LAY,0 -d ROW,150 -d COL,500 myfile.nc gives all the corresponding time steps:

netcdf dispersion {
  dimensions:
    DATE-TIME = 2 ;
    TSTEP = UNLIMITED ; // (51 currently)
    VAR = 1 ;

  variables:
    int TFLAG(TSTEP,VAR,DATE-TIME) ;
      TFLAG:units = "<YYYYDDD,HHMMSS>" ;
      TFLAG:long_name = "TFLAG           " ;
      TFLAG:var_desc = "Timestep-valid flags:  (1) YYYYDDD or (2) HHMMSS                                " ;

  data:
    TFLAG =
    2023182, 30000,
    2023182, 40000,
...

so the first value would correspond to the 182nd day in 2023 at 3 hours into the day (UTC).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment