jswhit/writing_netCDF.ipynb

## writing_netCDF.ipynb
{
 "metadata": {
  "celltoolbar": "Slideshow",
  "name": "",
  "signature": "sha256:4ced4abfca9bc693e4f73811b47cc04d94e0f737b84412e31c5bc850513988ea"
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "markdown",
     "metadata": {
      "slideshow": {
       "slide_type": "slide"
      }
     },
     "source": [
      "# Writing netCDF data\n",
      "\n",
      "**Important Note**: when running this notebook interactively in a browser, you probably will not be able to execute individual cells out of order without getting an error.  Instead, choose \"Run All\" from the Cell menu after you modify a cell."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "import netCDF4     # Note: python is case-sensitive!\n",
      "import numpy as np"
     ],
     "language": "python",
     "metadata": {
      "slideshow": {
       "slide_type": "fragment"
      }
     },
     "outputs": [],
     "prompt_number": 1
    },
    {
     "cell_type": "markdown",
     "metadata": {
      "slideshow": {
       "slide_type": "slide"
      }
     },
     "source": [
      "## Opening a file, creating a new Dataset\n",
      "\n",
      "Let's create a new, empty netCDF file named 'data/new.nc', opened for writing.\n",
      "\n",
      "Be careful, opening a file with 'w' will clobber any existing data (unless `clobber=False` is used, in which case an exception is raised if the file already exists).\n",
      "\n",
      "- `mode='r'` is the default.\n",
      "- `mode='a'` opens an existing file and allows for appending (does not clobber existing data)\n",
      "- `format` can be one of `NETCDF3_CLASSIC`, `NETCDF3_64BIT`, `NETCDF4_CLASSIC` or `NETCDF4` (default). `NETCDF4_CLASSIC` uses HDF5 for the underlying storage layer (as does `NETCDF4`) but enforces the classic netCDF 3 data model so data can be read with older clients.  "
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "try: ncfile.close()  # just to be safe, make sure dataset is not already open.\n",
      "except: pass\n",
      "ncfile = netCDF4.Dataset('data/new.nc',mode='w',format='NETCDF4') \n",
      "print ncfile"
     ],
     "language": "python",
     "metadata": {
      "slideshow": {
       "slide_type": "fragment"
      }
     },
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "<type 'netCDF4.Dataset'>\n",
        "root group (NETCDF4 data model, file format HDF5):\n",
        "    dimensions(sizes): \n",
        "    variables(dimensions): \n",
        "    groups: \n",
        "\n"
       ]
      }
     ],
     "prompt_number": 2
    },
    {
     "cell_type": "markdown",
     "metadata": {
      "slideshow": {
       "slide_type": "slide"
      }
     },
     "source": [
      "## Creating dimensions\n",
      "\n",
      "The **ncfile** object we created is a container for _dimensions_, _variables_, and _attributes_.   First, let's create some dimensions using the [`createDimension`](http://unidata.github.io/netcdf4-python/netCDF4.Dataset-class.html#createDimension) method.  \n",
      "\n",
      "- Every dimension has a name and a length.  \n",
      "- The name is a string that is used to specify the dimension to be used when creating a variable, and as a key to access the dimension object in the `ncfile.dimensions` dictionary.\n",
      "\n",
      "Setting the dimension length to `0` or `None` makes it unlimited, so it can grow. \n",
      "\n",
      "- For `NETCDF4` files, any variable's dimension can be unlimited.  \n",
      "- For `NETCDF4_CLASSIC` and `NETCDF3*` files, only one per variable can be unlimited, and it must be the leftmost (fastest varying) dimension."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "lat_dim = ncfile.createDimension('lat', 73)     # latitude axis\n",
      "lon_dim = ncfile.createDimension('lon', 144)    # longitude axis\n",
      "time_dim = ncfile.createDimension('time', None) # unlimited axis (can be appended to).\n",
      "print lat_dim\n",
      "print lon_dim\n",
      "print time_dim\n",
      "print ncfile.dimensions"
     ],
     "language": "python",
     "metadata": {
      "slideshow": {
       "slide_type": "fragment"
      }
     },
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "<type 'netCDF4.Dimension'>: name = 'lat', size = 73\n",
        "\n",
        "<type 'netCDF4.Dimension'>: name = 'lon', size = 144\n",
        "\n",
        "<type 'netCDF4.Dimension'> (unlimited): name = 'time', size = 0\n",
        "\n",
        "OrderedDict([('lat', <type 'netCDF4.Dimension'>: name = 'lat', size = 73\n",
        "), ('lon', <type 'netCDF4.Dimension'>: name = 'lon', size = 144\n",
        "), ('time', <type 'netCDF4.Dimension'> (unlimited): name = 'time', size = 0\n",
        ")])\n"
       ]
      }
     ],
     "prompt_number": 3
    },
    {
     "cell_type": "markdown",
     "metadata": {
      "slideshow": {
       "slide_type": "slide"
      }
     },
     "source": [
      "## Creating attributes\n",
      "\n",
      "netCDF attributes can be created just like you would for any python object. \n",
      "- Best to adhere to established conventions (like the [CF](http://cfconventions.org/) conventions)\n",
      "- We won't try to adhere to any specific convention here though."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "ncfile.title='My model data'\n",
      "print ncfile.title"
     ],
     "language": "python",
     "metadata": {
      "slideshow": {
       "slide_type": "fragment"
      }
     },
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "My model data\n"
       ]
      }
     ],
     "prompt_number": 4
    },
    {
     "cell_type": "markdown",
     "metadata": {
      "slideshow": {
       "slide_type": "fragment"
      }
     },
     "source": [
      "Try adding some more attributes..."
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {
      "slideshow": {
       "slide_type": "slide"
      }
     },
     "source": [
      "## Creating variables\n",
      "\n",
      "Now let's add some variables and store some data in them.  \n",
      "\n",
      "- A variable has a name, a type, a shape, and some data values.  \n",
      "- The shape of a variable is specified by a tuple of dimension names.  \n",
      "- A variable should also have some named attributes, such as 'units', that describe the data.\n",
      "\n",
      "The [`createVariable`](http://unidata.github.io/netcdf4-python/netCDF4.Dataset-class.html#createVariable) method takes 3 mandatory args.\n",
      "\n",
      "- the 1st argument is the variable name (a string). This is used as the key to access the variable object from the `variables` dictionary.\n",
      "- the 2nd argument is the datatype (most numpy datatypes supported).  \n",
      "- the third argument is a tuple containing the dimension names (the dimensions must be created first).  Unless this is a `NETCDF4` file, any unlimited dimension must be the leftmost one.\n",
      "- there are lots of optional arguments (many of which are only relevant when `format='NETCDF'`) to control compression, chunking, fill_value, etc.\n"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Define two variables with the same names as dimensions,\n",
      "# a conventional way to define \"coordinate variables\".\n",
      "lat = ncfile.createVariable('lat', np.float32, ('lat',))\n",
      "lat.units = 'degrees_north'\n",
      "lat.long_name = 'latitude'\n",
      "lon = ncfile.createVariable('lon', np.float32, ('lon',))\n",
      "lon.units = 'degrees_east'\n",
      "lon.long_name = 'longitude'\n",
      "time = ncfile.createVariable('time', np.float64, ('time',))\n",
      "time.units = 'hours since 1800-01-01'\n",
      "time.long_name = 'time'\n",
      "# Define a 3D variable to hold the data\n",
      "temp = ncfile.createVariable('temp',np.float64,('time','lat','lon')) # note: unlimited dimension is leftmost\n",
      "temp.units = 'K' # degrees Kelvin\n",
      "temp.standard_name = 'air_temperature' # this is a CF standard name\n",
      "print temp"
     ],
     "language": "python",
     "metadata": {
      "slideshow": {
       "slide_type": "fragment"
      }
     },
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "<type 'netCDF4.Variable'>\n",
        "float64 temp(time, lat, lon)\n",
        "    units: K\n",
        "    standard_name: air_temperature\n",
        "unlimited dimensions: time\n",
        "current shape = (0, 73, 144)\n",
        "filling on, default _FillValue of 9.96920996839e+36 used\n",
        "\n"
       ]
      }
     ],
     "prompt_number": 5
    },
    {
     "cell_type": "markdown",
     "metadata": {
      "slideshow": {
       "slide_type": "slide"
      }
     },
     "source": [
      "## Pre-defined variable attributes (read only)\n",
      "\n",
      "The netCDF4 module provides some useful pre-defined Python attributes for netCDF variables, such as dimensions, shape, dtype, ndim. \n",
      "\n",
      "Note: since no data has been written yet, the length of the 'time' dimension is 0."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "print \"-- Some pre-defined attributes for variable temp:\"\n",
      "print \"temp.dimensions:\", temp.dimensions\n",
      "print \"temp.shape:\", temp.shape\n",
      "print \"temp.dtype:\", temp.dtype\n",
      "print \"temp.ndim:\", temp.ndim"
     ],
     "language": "python",
     "metadata": {
      "slideshow": {
       "slide_type": "fragment"
      }
     },
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "-- Some pre-defined attributes for variable temp:\n",
        "temp.dimensions: (u'time', u'lat', u'lon')\n",
        "temp.shape: (0, 73, 144)\n",
        "temp.dtype: float64\n",
        "temp.ndim: 3\n"
       ]
      }
     ],
     "prompt_number": 6
    },
    {
     "cell_type": "markdown",
     "metadata": {
      "slideshow": {
       "slide_type": "slide"
      }
     },
     "source": [
      "## Writing data\n",
      "\n",
      "To write data a netCDF variable object, just treat it like a numpy array and assign values to a slice."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "nlats = len(lat_dim); nlons = len(lon_dim); ntimes = 3\n",
      "# Write latitudes, longitudes.\n",
      "# Note: the \":\" is necessary in these \"write\" statements\n",
      "lat[:] = -90. + (180./nlats)*np.arange(nlats) # south pole to north pole\n",
      "lon[:] = (180./nlats)*np.arange(nlons) # Greenwich meridian eastward\n",
      "# create a 3D array of random numbers\n",
      "data_arr = np.random.uniform(low=280,high=330,size=(ntimes,nlats,nlons))\n",
      "# Write the data.  This writes the whole 3D netCDF variable all at once.\n",
      "temp[:,:,:] = data_arr  # Appends data along unlimited dimension\n",
      "print \"-- Wrote data, temp.shape is now \", temp.shape\n",
      "# read data back from variable (by slicing it), print min and max\n",
      "print \"-- Min/Max values:\", temp[:,:,:].min(), temp[:,:,:].max()"
     ],
     "language": "python",
     "metadata": {
      "slideshow": {
       "slide_type": "fragment"
      }
     },
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "-- Wrote data, temp.shape is now  (3, 73, 144)\n",
        "-- Min/Max values: 280.001651325 329.999261968\n"
       ]
      }
     ],
     "prompt_number": 7
    },
    {
     "cell_type": "markdown",
     "metadata": {
      "slideshow": {
       "slide_type": "fragment"
      }
     },
     "source": [
      "- You can just treat a netCDF Variable object like a numpy array and assign values to it.\n",
      "- Variables automatically grow along unlimited dimensions (unlike numpy arrays)\n",
      "- The above writes the whole 3D variable all at once,  but you can write it a slice at a time instead.\n",
      "\n",
      "Let's add another time slice....\n"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# create a 2D array of random numbers\n",
      "data_slice = np.random.uniform(low=280,high=330,size=(nlats,nlons))\n",
      "temp[3,:,:] = data_slice   # Appends the 4th time slice\n",
      "print \"-- Wrote more data, temp.shape is now \", temp.shape"
     ],
     "language": "python",
     "metadata": {
      "slideshow": {
       "slide_type": "slide"
      }
     },
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "-- Wrote more data, temp.shape is now  (4, 73, 144)\n"
       ]
      }
     ],
     "prompt_number": 8
    },
    {
     "cell_type": "markdown",
     "metadata": {
      "slideshow": {
       "slide_type": "fragment"
      }
     },
     "source": [
      "Note that we have not yet written any data to the time variable.  It automatically grew as we appended data along the time dimension to the variable `temp`, but the data is missing."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "print time\n",
      "print time[:]  # dashes indicate masked values (where data has not yet been written)"
     ],
     "language": "python",
     "metadata": {
      "slideshow": {
       "slide_type": "fragment"
      }
     },
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "<type 'netCDF4.Variable'>\n",
        "float64 time(time)\n",
        "    units: hours since 1800-01-01\n",
        "    long_name: time\n",
        "unlimited dimensions: time\n",
        "current shape = (4,)\n",
        "filling on, default _FillValue of 9.96920996839e+36 used\n",
        "\n",
        "[-- -- -- --]\n"
       ]
      }
     ],
     "prompt_number": 9
    },
    {
     "cell_type": "markdown",
     "metadata": {
      "slideshow": {
       "slide_type": "slide"
      }
     },
     "source": [
      "Let's add write some data into the time variable.  \n",
      "\n",
      "- Given a set of datetime instances, use date2num to convert to numeric time values and then write that data to the variable."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from datetime import datetime\n",
      "from netCDF4 import date2num,num2date\n",
      "# 1st 4 days of October.\n",
      "dates = [datetime(2014,10,1,0),datetime(2014,10,2,0),datetime(2014,10,3,0),datetime(2014,10,4,0)]\n",
      "print dates\n",
      "times = date2num(dates, time.units)\n",
      "print times, time.units # numeric values\n",
      "time[:] = times\n",
      "# read time data back, convert to datetime instances, check values.\n",
      "print num2date(time[:],time.units)"
     ],
     "language": "python",
     "metadata": {
      "slideshow": {
       "slide_type": "fragment"
      }
     },
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "[datetime.datetime(2014, 10, 1, 0, 0), datetime.datetime(2014, 10, 2, 0, 0), datetime.datetime(2014, 10, 3, 0, 0), datetime.datetime(2014, 10, 4, 0, 0)]\n",
        "[ 1882440.  1882464.  1882488.  1882512.] hours since 1800-01-01\n",
        "[datetime.datetime(2014, 10, 1, 0, 0) datetime.datetime(2014, 10, 2, 0, 0)\n",
        " datetime.datetime(2014, 10, 3, 0, 0) datetime.datetime(2014, 10, 4, 0, 0)]\n"
       ]
      }
     ],
     "prompt_number": 10
    },
    {
     "cell_type": "markdown",
     "metadata": {
      "slideshow": {
       "slide_type": "slide"
      }
     },
     "source": [
      "## Closing a netCDF file\n",
      "\n",
      "It's **important** to close a netCDF file you opened for writing:\n",
      "\n",
      "- flushes buffers to make sure all data gets written\n",
      "- releases memory resources used by open netCDF files\n",
      "- lets you start over if you get an error, by recreating file from scratch"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": true,
     "input": [
      "# first print the Dataset object to see what we've got\n",
      "print ncfile\n",
      "# close the Dataset.\n",
      "ncfile.close(); print 'Dataset is closed!'"
     ],
     "language": "python",
     "metadata": {
      "slideshow": {
       "slide_type": "fragment"
      }
     },
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "<type 'netCDF4.Dataset'>\n",
        "root group (NETCDF4 data model, file format HDF5):\n",
        "    title: My model data\n",
        "    dimensions(sizes): lat(73), lon(144), time(4)\n",
        "    variables(dimensions): float32 \u001b[4mlat\u001b[0m(lat), float32 \u001b[4mlon\u001b[0m(lon), float64 \u001b[4mtime\u001b[0m(time), float64 \u001b[4mtemp\u001b[0m(time,lat,lon)\n",
        "    groups: \n",
        "\n",
        "Dataset is closed!\n"
       ]
      }
     ],
     "prompt_number": 11
    },
    {
     "cell_type": "markdown",
     "metadata": {
      "slideshow": {
       "slide_type": "slide"
      }
     },
     "source": [
      "##Exercise\n",
      "\n",
      "Read SREF 24-h forecast precip probability (exercise from **reading_netCDF** notebook) write to a file (with compression).\n",
      "\n",
      "- create a new Dataset.\n",
      "- first create dimensions.\n",
      "- create and fill coordinate variables to go with dimensions.\n",
      "- create precip probability variable, write data to it.\n",
      "- add attributes\n",
      "- close the Dataset."
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {
      "slideshow": {
       "slide_type": "slide"
      }
     },
     "source": [
      "# Advanced features\n",
      "\n",
      "So far we've only exercised features associated with the old netCDF version 3 data model.  netCDF version 4 adds a lot of new functionality that comes with the more flexible HDF5 storage layer.  \n",
      "\n",
      "Let's create a new file with `format='NETCDF4'` so we can try out some of these features."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "ncfile = netCDF4.Dataset('data/new2.nc','w',format='NETCDF4')\n",
      "print ncfile"
     ],
     "language": "python",
     "metadata": {
      "slideshow": {
       "slide_type": "fragment"
      }
     },
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "<type 'netCDF4.Dataset'>\n",
        "root group (NETCDF4 data model, file format HDF5):\n",
        "    dimensions(sizes): \n",
        "    variables(dimensions): \n",
        "    groups: \n",
        "\n"
       ]
      }
     ],
     "prompt_number": 12
    },
    {
     "cell_type": "markdown",
     "metadata": {
      "slideshow": {
       "slide_type": "slide"
      }
     },
     "source": [
      "## Creating Groups\n",
      "\n",
      "netCDF version 4 added support for organizing data in hierarchical groups.\n",
      "- analagous to directories in a filesystem. \n",
      "- Groups serve as containers for variables, dimensions and attributes, as well as other groups. \n",
      "- A `netCDF4.Dataset` defines creates a special group, called the 'root group', which is similar to the root directory in a unix filesystem. \n",
      "\n",
      "- groups are created using the [`createGroup`](http://unidata.github.io/netcdf4-python/netCDF4.Dataset-class.html#createGroup) method.\n",
      "- takes a single argument (a string, which is the name of the Group instance).  This string is used as a key to access the group instances in the `groups` dictionary.\n",
      "\n",
      "Here we create two groups to hold data for two different model runs."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "grp1 = ncfile.createGroup('model_run1')\n",
      "grp2 = ncfile.createGroup('model_run2')\n",
      "print ncfile\n",
      "print grp1\n",
      "print grp2"
     ],
     "language": "python",
     "metadata": {
      "slideshow": {
       "slide_type": "fragment"
      }
     },
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "<type 'netCDF4.Dataset'>\n",
        "root group (NETCDF4 data model, file format HDF5):\n",
        "    dimensions(sizes): \n",
        "    variables(dimensions): \n",
        "    groups: model_run1, model_run2\n",
        "\n",
        "<type 'netCDF4.Group'>\n",
        "group /model_run1:\n",
        "    dimensions(sizes): \n",
        "    variables(dimensions): \n",
        "    groups: \n",
        "\n",
        "<type 'netCDF4.Group'>\n",
        "group /model_run2:\n",
        "    dimensions(sizes): \n",
        "    variables(dimensions): \n",
        "    groups: \n",
        "\n"
       ]
      }
     ],
     "prompt_number": 13
    },
    {
     "cell_type": "markdown",
     "metadata": {
      "slideshow": {
       "slide_type": "slide"
      }
     },
     "source": [
      "Create some dimensions in the root group."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "lat_dim = ncfile.createDimension('lat', 73)     # latitude axis\n",
      "lon_dim = ncfile.createDimension('lon', 144)    # longitude axis\n",
      "time_dim = ncfile.createDimension('time', None) # unlimited axis (can be appended to)."
     ],
     "language": "python",
     "metadata": {
      "slideshow": {
       "slide_type": "fragment"
      }
     },
     "outputs": [],
     "prompt_number": 14
    },
    {
     "cell_type": "markdown",
     "metadata": {
      "slideshow": {
       "slide_type": "fragment"
      }
     },
     "source": [
      "Now create a variable in grp1 and grp2.  The library will search recursively upwards in the group tree to find the dimensions (which in this case are defined one level up).\n",
      "\n",
      "- These variables are create with **zlib compression**, another nifty feature of netCDF 4. \n",
      "- The data are automatically compressed when data is written to the file, and uncompressed when the data is read.  \n",
      "- This can really save disk space, especially when used in conjunction with the [**least_significant_digit**](http://unidata.github.io/netcdf4-python/netCDF4.Dataset-class.html#createVariable) keyword argument, which causes the data to be quantized (truncated) before compression.  This makes the compression lossy, but more efficient."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "temp1 = grp1.createVariable('temp',np.float64,('time','lat','lon'),zlib=True)\n",
      "temp2 = grp2.createVariable('temp',np.float64,('time','lat','lon'),zlib=True)\n",
      "print ncfile\n",
      "print grp1\n",
      "print grp2"
     ],
     "language": "python",
     "metadata": {
      "slideshow": {
       "slide_type": "fragment"
      }
     },
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "<type 'netCDF4.Dataset'>\n",
        "root group (NETCDF4 data model, file format HDF5):\n",
        "    dimensions(sizes): lat(73), lon(144), time(0)\n",
        "    variables(dimensions): \n",
        "    groups: model_run1, model_run2\n",
        "\n",
        "<type 'netCDF4.Group'>\n",
        "group /model_run1:\n",
        "    dimensions(sizes): \n",
        "    variables(dimensions): float64 \u001b[4mtemp\u001b[0m(time,lat,lon)\n",
        "    groups: \n",
        "\n",
        "<type 'netCDF4.Group'>\n",
        "group /model_run2:\n",
        "    dimensions(sizes): \n",
        "    variables(dimensions): float64 \u001b[4mtemp\u001b[0m(time,lat,lon)\n",
        "    groups: \n",
        "\n"
       ]
      }
     ],
     "prompt_number": 15
    },
    {
     "cell_type": "markdown",
     "metadata": {
      "slideshow": {
       "slide_type": "slide"
      }
     },
     "source": [
      "##Creating a variable with a compound data type\n",
      "\n",
      "- Compound data types map directly to numpy structured (a.k.a 'record' arrays). \n",
      "- Structured arrays are akin to C structs, or derived types in Fortran. \n",
      "- They allow for the construction of table-like structures composed of combinations of other data types, including other compound types. \n",
      "- Might be useful for representing multiple parameter values at each point on a grid, or at each time and space location for scattered (point) data. \n",
      "\n",
      "Here we create a variable with a compound data type to represent complex data (there is no native complex data type in netCDF). \n",
      "\n",
      "- The compound data type is created with the [`createCompoundType`](http://unidata.github.io/netcdf4-python/netCDF4.Dataset-class.html#createCompoundType) method."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# create complex128 numpy structured data type\n",
      "complex128 = np.dtype([('real',np.float64),('imag',np.float64)])\n",
      "# using this numpy dtype, create a netCDF compound data type object\n",
      "# the string name can be used as a key to access the datatype from the cmptypes dictionary.\n",
      "complex128_t = ncfile.createCompoundType(complex128,'complex128')\n",
      "# create a variable with this data type, write some data to it.\n",
      "cmplxvar = grp1.createVariable('cmplx_var',complex128_t,('time','lat','lon'))\n",
      "# write some data to this variable\n",
      "# first create some complex random data\n",
      "nlats = len(lat_dim); nlons = len(lon_dim)\n",
      "data_arr_cmplx = np.random.uniform(size=(nlats,nlons))+1.j*np.random.uniform(size=(nlats,nlons))\n",
      "# write this complex data to a numpy complex128 structured array\n",
      "data_arr = np.empty((nlats,nlons),complex128)\n",
      "data_arr['real'] = data_arr_cmplx.real; data_arr['imag'] = data_arr_cmplx.imag\n",
      "cmplxvar[0] = data_arr  # write the data to the variable (appending to time dimension)\n",
      "print cmplxvar\n",
      "data_out = cmplxvar[0] # read data back from variable\n",
      "print data_out.dtype, data_out.shape, data_out[0,0]"
     ],
     "language": "python",
     "metadata": {
      "slideshow": {
       "slide_type": "fragment"
      }
     },
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "<type 'netCDF4.Variable'>\n",
        "compound cmplx_var(time, lat, lon)\n",
        "compound data type: [('real', '<f8'), ('imag', '<f8')]\n",
        "path = /model_run1\n",
        "unlimited dimensions: time\n",
        "current shape = (1, 73, 144)\n",
        "\n",
        "[('real', '<f8'), ('imag', '<f8')] (73, 144) (0.11203931478357165, 0.8210570135673454)\n"
       ]
      }
     ],
     "prompt_number": 16
    },
    {
     "cell_type": "markdown",
     "metadata": {
      "slideshow": {
       "slide_type": "slide"
      }
     },
     "source": [
      "##Creating a variable with a variable-length (vlen) data type\n",
      "\n",
      "netCDF 4 has support for variable-length or \"ragged\" arrays. These are arrays of variable length sequences having the same type. \n",
      "\n",
      "- To create a variable-length data type, use the [`createVLType`](http://unidata.github.io/netcdf4-python/netCDF4.Dataset-class.html#createVLType) method.\n",
      "- The numpy datatype of the variable-length sequences and the name of the new datatype must be specified. "
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "vlen_t = ncfile.createVLType(np.int64, 'phony_vlen')"
     ],
     "language": "python",
     "metadata": {
      "slideshow": {
       "slide_type": "fragment"
      }
     },
     "outputs": [],
     "prompt_number": 17
    },
    {
     "cell_type": "markdown",
     "metadata": {
      "slideshow": {
       "slide_type": "fragment"
      }
     },
     "source": [
      "A new variable can then be created using this datatype."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "vlvar = grp2.createVariable('phony_vlen_var', vlen_t, ('time','lat','lon'))"
     ],
     "language": "python",
     "metadata": {
      "slideshow": {
       "slide_type": "fragment"
      }
     },
     "outputs": [],
     "prompt_number": 18
    },
    {
     "cell_type": "markdown",
     "metadata": {
      "slideshow": {
       "slide_type": "slide"
      }
     },
     "source": [
      "Since there is no native vlen datatype in numpy, vlen arrays are represented in python as object arrays (arrays of dtype `object`). \n",
      "\n",
      "- These are arrays whose elements are Python object pointers, and can contain any type of python object. \n",
      "- For this application, they must contain 1-D numpy arrays all of the same type but of varying length. \n",
      "- Fill with 1-D random numpy int64 arrays of random length between 1 and 10."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "vlen_data = np.empty((nlats,nlons),object)\n",
      "for i in range(nlons):\n",
      "    for j in range(nlats):\n",
      "        size = np.random.randint(1,10,size=1)\n",
      "        vlen_data[j,i] = np.random.randint(0,10,size=size)\n",
      "vlvar[0] = vlen_data # append along unlimited dimension (time)\n",
      "print vlvar\n",
      "print 'data =\\n',vlvar[:]"
     ],
     "language": "python",
     "metadata": {
      "slideshow": {
       "slide_type": "fragment"
      }
     },
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "<type 'netCDF4.Variable'>\n",
        "vlen phony_vlen_var(time, lat, lon)\n",
        "vlen data type: int64\n",
        "path = /model_run2\n",
        "unlimited dimensions: time\n",
        "current shape = (1, 73, 144)\n",
        "\n",
        "data =\n",
        "[[[array([0, 6, 6, 0, 6, 5, 5, 8]) array([2, 0, 2]) array([5, 1, 4, 1])\n",
        "   ..., array([6, 9, 8, 7, 4, 0, 8, 9]) array([3, 7, 5, 9, 3, 4, 2, 2, 7])\n",
        "   array([4, 1, 1, 0, 4, 3, 5, 5, 8])]\n",
        "  [array([7, 4, 8, 0, 6, 2, 3]) array([9, 4, 0, 3, 5, 2, 0, 0, 2])\n",
        "   array([6, 0]) ..., array([9, 8, 5, 8])\n",
        "   array([0, 5, 6, 1, 1, 5, 1, 8, 3]) array([2, 6, 1, 6, 1, 5, 5, 1])]\n",
        "  [array([4, 6]) array([5, 7, 5, 4, 4]) array([5, 4, 2, 6, 6, 1, 6, 7, 3])\n",
        "   ..., array([1]) array([4, 0, 3, 9, 3, 6])\n",
        "   array([4, 3, 9, 0, 6, 3, 4, 2])]\n",
        "  ..., \n",
        "  [array([1, 4, 4, 0, 2, 0, 5, 9, 1]) array([2, 9, 7, 2, 2])\n",
        "   array([0, 9, 3, 6, 0, 2, 1, 6]) ..., array([0, 9, 4, 5, 8, 0, 1, 7, 7])\n",
        "   array([8, 0, 1]) array([6, 5])]\n",
        "  [array([0, 8, 0, 0, 0, 0]) array([9, 8, 7, 0]) array([8, 9, 7, 4, 7])\n",
        "   ..., array([8, 5, 8, 0, 4]) array([7, 1, 3, 5, 7, 8, 4, 4])\n",
        "   array([4, 7, 4, 2])]\n",
        "  [array([1, 5, 8, 1, 5, 6, 9]) array([5, 7])\n",
        "   array([7, 7, 1, 6, 1, 1, 8, 2]) ..., array([6, 4, 5, 0, 0, 9, 9, 5])\n",
        "   array([8, 5]) array([6, 6, 5])]]]\n"
       ]
      }
     ],
     "prompt_number": 19
    },
    {
     "cell_type": "markdown",
     "metadata": {
      "slideshow": {
       "slide_type": "slide"
      }
     },
     "source": [
      "Close the Dataset and examine the contents with ncdump."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "ncfile.close()\n",
      "!ncdump -h data/new2.nc"
     ],
     "language": "python",
     "metadata": {
      "slideshow": {
       "slide_type": "fragment"
      }
     },
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "netcdf new2 {\r\n",
        "types:\r\n",
        "  compound complex128 {\r\n",
        "    double real ;\r\n",
        "    double imag ;\r\n",
        "  }; // complex128\r\n",
        "  int64(*) phony_vlen ;\r\n",
        "dimensions:\r\n",
        "\tlat = 73 ;\r\n",
        "\tlon = 144 ;\r\n",
        "\ttime = UNLIMITED ; // (1 currently)\r\n",
        "\r\n",
        "group: model_run1 {\r\n",
        "  variables:\r\n",
        "  \tdouble temp(time, lat, lon) ;\r\n",
        "  \tcomplex128 cmplx_var(time, lat, lon) ;\r\n",
        "  } // group model_run1\r\n",
        "\r\n",
        "group: model_run2 {\r\n",
        "  variables:\r\n",
        "  \tdouble temp(time, lat, lon) ;\r\n",
        "  \tphony_vlen phony_vlen_var(time, lat, lon) ;\r\n",
        "  } // group model_run2\r\n",
        "}\r\n"
       ]
      }
     ],
     "prompt_number": 20
    },
    {
     "cell_type": "markdown",
     "metadata": {
      "slideshow": {
       "slide_type": "slide"
      }
     },
     "source": [
      "##Other interesting and useful projects using netcdf4-python\n",
      "\n",
      "- [Xray](http://xray.readthedocs.org/en/stable/): N-dimensional variant of the core [pandas](http://pandas.pydata.org) data structure that can operate on netcdf variables.\n",
      "- [Iris](http://scitools.org.uk/iris/): a data model to create a data abstraction layer which isolates analysis and visualisation code from data format specifics.  Uses netcdf4-python to access netcdf data (can also handle GRIB).\n",
      "- [Biggus](https://github.com/SciTools/biggus): Virtual large arrays (from netcdf variables) with lazy evaluation.\n",
      "- [cf-python](http://cfpython.bitbucket.org/): Implements the [CF](http://cfconventions.org) data model for the reading, writing and processing of data and metadata. "
     ]
    }
   ],
   "metadata": {}
  }
 ]
}