Skip to content

Instantly share code, notes, and snippets.

@hevgyrt
Last active April 29, 2024 07:22
Show Gist options
  • Save hevgyrt/9f6fa87805d98cf78e25e7718ae23f6b to your computer and use it in GitHub Desktop.
Save hevgyrt/9f6fa87805d98cf78e25e7718ae23f6b to your computer and use it in GitHub Desktop.
How to make a NetCDF file compliant with CF and ACDD
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Tutorial for creating CF and ACDD compliant NetCDF file\n",
"This tutoral will focus on creating NetCDF files compliant with the Climate and Forecast (CF) convention and the Attribute Convention for Data Discovery (ACDD). \n",
"\n",
"# 1. Introduction\n",
"[NetCDF](https://www.unidata.ucar.edu/software/netcdf/) is a very convenient and powerful file format in terms of storing data. However, having metadata describing the content of the file is crucial in order to generate a machine readable self-describing product compliant with widely used international standards. If you do this, you are contributing to follow the FAIR (Findable, Accessible, Interoperable, Reuseable) guiding principles of data management. In order to be precise when talking about metadata, we split types of metadata into two categories: discovery metadata and use metadata. \n",
"\n",
"Discovery metadata describes e.g. the who, what, where and when about the products as well as the interfaces and access points to the data. Examples of discovery metadata standards are the GCMD DIF and ISO19115. If a NetCDF file follows ACDD, the file is compliant to the above mentioned standards which thus can be extracted from the file.\n",
"\n",
"Use metadata provides a definitive description of what each variable in the dataset represents. Use metadata serves the purpose of describing the actual content of the data themselves allowing users to understand and correctly use the datasets. Examples of use metadata are units, missing values and spatio-temporal properties of the data. If a NetCDF file follows the CF convention, enough information is in place to make the file self-describing.\n",
"\n",
"## 1.1 Requirements to run the following jupyter-notebook\n",
"In this tutorial, we will use an already existing NetCDF/CF file to create a test file and stepwise make it compliant with the above mentioned standards. The data for this file is fetched by means of OPeNDAP (ie. streaming of data) and __no download prior to doing the excersize is needed__. This is one of the great features in CF compliant datasets.\n",
"\n",
"Before running this tutorial, you also need some python packages. You can create a conda environment to get this. On linux, you can download and install conda as:\n",
"\n",
"*wget https://repo.anaconda.com/archive/Anaconda3-5.3.1-Linux-x86_64.sh*\n",
"\n",
"*bash Anaconda3-5.3.1-Linux-x86_64.sh*\n",
"\n",
"in .bashrc export PATH=~/anaconda3/bin:$PATH\n",
"\n",
"To create a conda environment with all the necessary packages, use the following command:\n",
"\n",
"*conda create -n nc_cf_acdd python=3.7 netCDF4=1.4.0 numpy*\n",
"\n",
"In your terminal, activate the environment and run jupyter-notebook:\n",
"\n",
"*conda activate nc_cf_acdd && jupyter-notebook*"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"# importing packages\n",
"import netCDF4\n",
"from netCDF4 import Dataset\n",
"import numpy as np\n",
"import datetime"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 2. Create a NetCDF file with a structure but minimal metadata"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"# OPeNDAP URL to Copernicus Sentienl-2 data in NetCDF/CF from satellittdata.no\n",
"url = \"http://nbstds.met.no/thredds/dodsC/NBS/S2A/2019/08/23/S2A_MSIL1C_20190823T125711_N0208_R038_T37XEL_20190823T131411.nc\"\n",
"\n",
"# Specifying spatial subset\n",
"x0,xn = 0,99\n",
"y0,yn = 0,99\n",
"\n",
"# Reading the data\n",
"ncin = Dataset(url, 'r')\n",
"time = ncin['time'][:]\n",
"x_coord = ncin['x'][x0:xn]\n",
"y_coord = ncin['y'][y0:yn]\n",
"b2 = ncin['B2'][0,y0:yn,x0:xn]\n",
"lat = ncin['lat'][y0:yn,x0:xn]\n",
"lon = ncin['lon'][y0:yn,x0:xn]\n",
"ncin.close()\n",
"\n",
"# Creating output\n",
"test_fname = 'test_netCDF.nc'\n",
"\n",
"with (netCDF4.Dataset(test_fname, 'w', format='NETCDF4')) as ncout:\n",
" dim_time = ncout.createDimension('T',1)\n",
" dim_x = ncout.createDimension('X',xn)\n",
" dim_y = ncout.createDimension('Y',yn)\n",
"\n",
" nctime = ncout.createVariable('time','i4',('T',))\n",
" nctime[:] = time[0]\n",
" \n",
" nclat = ncout.createVariable('lat','f4',('Y','X',))\n",
" nclon = ncout.createVariable('lon','f4',('Y','X',))\n",
" nclat[:,:]=lat\n",
" nclon[:,:]=lon\n",
"\n",
" # add variable\n",
" varout = ncout.createVariable('B2',np.int16, ('T', 'Y', 'X'))\n",
" varout[:] = b2"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It is evident that the file is not well described (you can check this by means of using software like ncdump). \n",
"\n",
"# 3. CF convention\n",
"Let's start with the [CF convention](http://cfconventions.org/). CF is designed to *promote the processing and sharing of files created with the NetCDF API*. It is very useful to read through the [documentation](http://cfconventions.org/Data/cf-conventions/cf-conventions-1.7/cf-conventions.html). This can, however, become a bit cumbersome so we will try to cover the most important bits for our particular dataset which is:\n",
"\n",
"- use metadata for variables,\n",
"- coordinate systems,\n",
"- global attributes.\n",
"\n",
"We will go through these stepwise in the following sections.\n",
"\n",
"## 3.1 Use metadata for variables\n",
"Use metadata can involve a number of things like flags, units, valid range of data, and scale factors depening on you product. We will, however, restric this to a minimum according to product we are dealing with which will be units, standard_name and long_name. \n",
"\n",
"To add standard_name, we should use the [*CF standard name table*](http://cfconventions.org/Data/cf-standard-names/69/build/cf-standard-name-table.html). For some specific variables, you may not find an entry in this table. Then you could contact the CF community for advice and fill the other attributes as best as you can.\n",
"\n",
"In order to add these attributes in the various variables, we do the following:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"ncout = Dataset(test_fname, mode='r+') # r+ for append mode\n",
"\n",
"nctime = ncout.variables['time']\n",
"nctime.long_name = 'reference time of satellite image'\n",
"nctime.units = 'seconds since 1981-01-01 00:00:00'\n",
"nctime.calendar = 'gregorian'\n",
"\n",
"\n",
"nclat = ncout.variables['lat']\n",
"nclat.standard_name = 'latitude'\n",
"nclat.units = 'degrees_north'\n",
"nclat.long_name = 'latitude'\n",
"\n",
"nclon = ncout.variables['lon']\n",
"nclon.long_name = 'longitude'\n",
"nclon.units = 'degrees_east'\n",
"nclon.standard_name = 'longitude'\n",
"\n",
"b2 = ncout.variables['B2']\n",
"b2.units = \"1\"\n",
"b2.standard_name = 'toa_bidirectional_reflectance'\n",
"b2.long_name = 'Reflectance in band B2'\n",
"\n",
"# in order to explain the variable a bit more, we add the following\n",
"b2.bandwidth = '65'\n",
"b2.bandwidth_unit = 'nm'\n",
"b2.wavelength = '490'\n",
"b2.wavelength_unit = 'nm'\n",
"b2.solar_irradiance = '1959.72'\n",
"b2.solar_irradiance_unit= 'W/m2/um'\n",
"\n",
"#ncout.variables\n",
"ncout.close()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3.2 Coordinate system\n",
"The product must be georeferenced in some coordinate system. You can read more about this [here](http://cfconventions.org/Data/cf-conventions/cf-conventions-1.7/cf-conventions.html#coordinate-system) in the CF convention document. Our product is in a specific map projection and we have to add information about the projection in a dedicated variable. This is carried out in the code below"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"ncout = Dataset(test_fname, mode='r+') # r+ for append mode\n",
"\n",
"# Creating coordinate reference variable (attributes depends on projection)\n",
"nc_crs = ncout.createVariable('UTM_projection',np.int32)\n",
"nc_crs.latitude_of_projection_origin = 0.0\n",
"nc_crs.proj4_string = \"+proj=utm +zone=37 +datum=WGS84 +units=m +no_defs \"\n",
"nc_crs.scale_factor_at_central_meridian = 0.9996\n",
"nc_crs.longitude_of_central_meridian = 39.0\n",
"nc_crs.grid_mapping_name = 'transverse_mercator' #Have a look at the grid_mapping_name appendix in CF document\n",
"nc_crs.false_easting = 500000.0\n",
"nc_crs.false_northing = 0.0\n",
"nc_crs.epsg_code = 32637\n",
"\n",
"# Adding coordinate reference to the variables\n",
"b2 = ncout.variables['B2']\n",
"b2.coordinates = \"lat lon\"\n",
"b2.grid_mapping = 'UTM_projection'\n",
"\n",
"# Creating variables deciding the extent of the product in the coordinate reference system\n",
"ncx = ncout.createVariable('x','i4', 'X', zlib=True)\n",
"ncx.units = 'm'\n",
"ncx.standard_name= 'projection_x_coordinate'\n",
"ncx.long_name= 'Easting'\n",
"ncx[:] = x_coord\n",
"\n",
"ncy = ncout.createVariable('y','i4', 'Y', zlib=True)\n",
"ncy.units = 'm'\n",
"ncy.standard_name= 'projection_y_coordinate'\n",
"ncy.long_name= 'Northing'\n",
"ncy[:] = y_coord\n",
"\n",
"#ncout.variables\n",
"ncout.close()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3.3 Global attributes\n",
"The CF convention requires some global attributes describing the product. You can read more about this [here](http://cfconventions.org/Data/cf-conventions/cf-conventions-1.7/cf-conventions.html#_attributes) in the CF convention document. In the following code, we will add attributes in a new way compared with above:"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"nowstr = datetime.datetime.utcnow().isoformat()\n",
"\n",
"ncout = Dataset(test_fname, mode='r+') # r+ for append mode\n",
"\n",
"globalAttribs = {}\n",
"globalAttribs['title'] = \"Test product\"\n",
"globalAttribs['Conventions'] = \"CF-1.6\"\n",
"globalAttribs['summary'] = 'Subsetted Sentinel-2 Multi-Spectral Instrument Level-1C product.'\n",
"globalAttribs['institution'] = \"Norwegian Meteorological Institute\"\n",
"globalAttribs['history'] = nowstr + \". Created.\"\n",
"globalAttribs['source'] = \"surface observation\"\n",
"globalAttribs['references'] = \"https://sentinel.esa.int/web/sentinel/user-guides/sentinel-2-msi/product-types/level-1c\"\n",
"\n",
"ncout.setncatts(globalAttribs)\n",
"ncout.sync()\n",
"\n",
"ncout.close()\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 4. ACDD\n",
"Now, the file should be CF compliant and hence be both machine readable and self-describing. However, in order to make you data discoverable (i.e. to describe the who, what, where and when for the data), the data should follow the [ACDD](http://wiki.esipfed.org/index.php/Attribute_Convention_for_Data_Discovery_1-3). ACDD defines a number of global attributes grouped as __highly recommended__, __recommended__ and __suggested__. It also suggests some highly recommended variable attributes. We will encourage you to at least follow the __higly recommended__ global attributes, but also the below listed attributes from the other two categories: \n",
"- id, \n",
"- date_created, \n",
"- geospatial_lat_min, \n",
"- geospatial_lat_max, \n",
"- geospatial_lon_min, \n",
"- geospatial_lon_max, \n",
"- time_coverage_start, \n",
"- time_coverage_end (if applicable),\n",
"- keywords_vocabulary.\n",
"\n",
"Below, we will add all these attributes."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"ncout = Dataset(test_fname, mode='r+') # r+ for append mode\n",
"dt = float(ncout['time'][0].data)\n",
"\n",
"globalAttribs['id'] = 'preferably a UUID'\n",
"globalAttribs['date_created'] = datetime.datetime.utcnow().isoformat()\n",
"globalAttribs['geospatial_lat_min'] = ncout['lat'][:].min()\n",
"globalAttribs['geospatial_lat_max'] = ncout['lat'][:].max()\n",
"globalAttribs['geospatial_lon_min'] = ncout['lon'][:].min()\n",
"globalAttribs['geospatial_lon_max'] = ncout['lon'][:].max()\n",
"globalAttribs['time_coverage_start'] = (datetime.datetime(1981,1,1, 0,0,0) + datetime.timedelta(0, dt)).isoformat()\n",
"globalAttribs['Conventions'] = \"CF-1.6, ACDD-1.3\"\n",
"globalAttribs['keywords'] = ['Earth Science > Atmosphere > Atmospheric radiation > Reflectance']\n",
"globalAttribs['keywords_vocabulary'] = \"GCMD Science Keywords\"\n",
"\n",
"globalAttribs['license'] = \"Freely Distributed\"\n",
"globalAttribs['standard_name_vocabulary'] = 'CF Standard Name Table v69'\n",
"\n",
"ncout.setncatts(globalAttribs)\n",
"ncout.sync()\n",
"\n",
"ncout.close()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 5. Closing remarks\n",
"You can now check your dataset in online compliance checkers like [this](https://pumatest.nerc.ac.uk/cgi-bin/cf-checker.pl) or [this](https://applicate.met.no/dataset_validation/form). Then, you can start looking at your own data :) Thank you for your valuable contribution in making your data FAIR!"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.5"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment