I have a NetCDF I downloaded from firesmoke.ca (Bluesky Canadian Wildfire Smoke Forecasting System) and I wanted to retrieve the PM 2.5 value fo a given geo coordinate or nearest lat / lon.
The NetCDF file can't be opened as a textfile as it's a binary format but there's a few tools I used to inspect the data model inside:
ncdump -c myfile.nc
gave me the following:
netcdf dispersion {
dimensions:
TSTEP = UNLIMITED ; // (51 currently)
DATE-TIME = 2 ;
LAY = 1 ;
VAR = 1 ;
ROW = 381 ;
COL = 1081 ;
variables:
int TFLAG(TSTEP, VAR, DATE-TIME) ;
TFLAG:units = "<YYYYDDD,HHMMSS>" ;
TFLAG:long_name = "TFLAG " ;
TFLAG:var_desc = "Timestep-valid flags: (1) YYYYDDD or (2) HHMMSS " ;
float PM25(TSTEP, LAY, ROW, COL) ;
PM25:long_name = "PM25 " ;
PM25:units = "ug/m^3 " ;
PM25:var_desc = "PM25 " ;
// global attributes:
:IOAPI_VERSION = "$Id: @(#) ioapi library version 3.0 $ " ;
:EXEC_ID = "???????????????? " ;
:FTYPE = 1 ;
:CDATE = 2023182 ;
:CTIME = 45032 ;
:WDATE = 2023182 ;
:WTIME = 45032 ;
:SDATE = 2023182 ;
:STIME = 30000 ;
:TSTEP = 10000 ;
:NTHIK = 1 ;
:NCOLS = 1081 ;
:NROWS = 381 ;
:NLAYS = 1 ;
:NVARS = 1 ;
:GDTYP = 1 ;
:P_ALP = 0. ;
:P_BET = 0. ;
:P_GAM = 0. ;
:XCENT = -106. ;
:YCENT = 51. ;
:XORIG = -160. ;
:YORIG = 32. ;
:XCELL = 0.100000001490116 ;
:YCELL = 0.100000001490116 ;
:VGTYP = 5 ;
:VGTOP = -9999.f ;
:VGLVLS = 10.f, 0.f ;
:GDNAM = "HYSPLIT CONC " ;
:UPNAM = "hysplit2netCDF " ;
:VAR-LIST = "PM25 " ;
:FILEDESC = "Hysplit Concentration Model Output lat-lon coordinate system...";
The float PM25(TSTEP, LAY, ROW, COL) ;
part seemed immediately interesting to me, and its function-style notation made me think I could invoke it with the parameters.
Opening Panoply and trying to create a plot for this variable showed a box on the map roughly the area of the data domain I already knew this file covered. Switching to the Array
tab of the plot gave more useful indication of the structure & format of the data in the PM25 "variable".
I noted that ncdump
didn't have a way to "invoke" a variable with various arguments, but some searching led me to ncks which does:
ncks -v PM25 -d TSTEP,0,50 -d LAY,0 -d ROW,150 -d COL,500 myfile.nc
Based on the ncdump output I could see the TSTEP
dimension had 51 values, and ROW
had 381 and COL
had 1081
so the command above prints out all of the PM2.5 values for coord 150,500 on the geo grid for all time steps on the only "layer".
Next step was figuring out how to pick the right ROW / COL for the lat / lon I'm interested in.
I noticed in Panoply when clicking the lat
variable that it indicates it is a "virtual variable" with the description "synthesized coordinate from YORIG YCELL global attributes". I remember seeing YORIG
in the ncdump output above.
Quick math shows that YORIG + YCELL * MAX(ROW) = MAX(LAT)
or 32 + 0.10 * 381 = 70.1
(verified the max lat in the Panoply plot for the lat variable in the array tab). So I can pick the closest row or column based on this math.
While I can also probably determine the date / time for the given TSTEP using this method and the corresponding attributes (Panoply notes that the time
virtual variable is computed as "synthesized time coordinate from SDATE, STIME, STEP global attributes"), it seems the TFLAG
variable is easier to use for this:
ncks -v TFLAG -d TSTEP,0,50 -d LAY,0 -d ROW,150 -d COL,500 myfile.nc
gives all the corresponding time steps:
netcdf dispersion {
dimensions:
DATE-TIME = 2 ;
TSTEP = UNLIMITED ; // (51 currently)
VAR = 1 ;
variables:
int TFLAG(TSTEP,VAR,DATE-TIME) ;
TFLAG:units = "<YYYYDDD,HHMMSS>" ;
TFLAG:long_name = "TFLAG " ;
TFLAG:var_desc = "Timestep-valid flags: (1) YYYYDDD or (2) HHMMSS " ;
data:
TFLAG =
2023182, 30000,
2023182, 40000,
...
so the first value would correspond to the 182nd day in 2023 at 3 hours into the day (UTC).