Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save ashok-arjun/e8c535885ceae7204bc67ecbd8bd889a to your computer and use it in GitHub Desktop.
Save ashok-arjun/e8c535885ceae7204bc67ecbd8bd889a to your computer and use it in GitHub Desktop.
Python Notebook Example on how to convert a netcdf file to csv file
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Data conversion (from NetCDF to CSV)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In this chapter, we'll see **how to convert** datapoints of several years of Ocean Sea Surface Temperature from **netCDF format (.nc)** to **tabular format (.csv)**.\n",
"\n",
"Some users are also interested in converting Kelvin Temperature to Celcius Degree, that is why there will be an example at the end.\n",
"\n",
"#### Tip\n",
"<hr>\n",
"- To get the **netCDF files** on your local hard drive, follow the [previous chapter](http://localhost:8888/notebooks/sf_Temp/copernicus-marine-motu-fetcher.ipynb#Install-Copernicus-Marine-Environment).\n",
"\n",
"\n",
"- If you aren’t familiar with **this data format**, the [netCDF FAQ](http://www.unidata.ucar.edu/software/netcdf/docs/faq.html#What-Is-netCDF) is a good place to start. \n",
"\n",
"\n",
"- A HowTo on how to **manipulate** netCDF files content with **Python** is available [here]()."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Summary\n",
"\n",
"- #### working directories setup\n",
"- #### xarray install (via pip)\n",
"- #### xarray usage to convert netcdf to csv\n",
" - #### One-file-conversion-(specifying-a-filename)\n",
" - #### Several-files-conversion-(specifying-a-directory)\n",
"- #### pandas usage to clean/minize csv (without continent values)\n",
"- #### manipulate [pandas](https://github.com/pydata/pandas) DataFrames within notebook - [Qgrid](http://qgrid.readthedocs.io/en/widget-guidelines/#)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## working directories setup"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"csv netcdf raw-data\r\n"
]
}
],
"source": [
"# List the content of the directory including Copernicus Marine Data (.nc, .png, .geotiff, .xml, pdf etc...)\n",
"!ls /home/mercator/copernicus-marine-data/"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": true,
"slideshow": {
"slide_type": "-"
}
},
"outputs": [],
"source": [
"# Replace 'local_storage_directory', 'netcdf_dir' and 'csv_dir' by respectively\n",
"# the directory path to Copernicus Marine data, the directory path to netcdf files\n",
"# and the directory path to csv files\n",
"local_storage_directory = '/home/mercator/copernicus-marine-data/'\n",
"netcdf_dir = local_storage_directory + 'netcdf/'\n",
"csv_dir = local_storage_directory + 'csv/'"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## xarray install (via pip)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"\n",
"\n",
"#### __Important__\n",
"<hr>\n",
"\n",
"The **bash command (pip)** should be executed **only once** (to get/download the python module in the system), **not everytime you open this notebook**.\n",
"\n",
"<br>\n",
"If you **already have** this module, you can **go to** the **next slide** (simply import it)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"%%bash\n",
"pip install xarray"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## xarray usage to convert netcdf to csv"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"Import module(s) should be executed only once per kernel execution."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": true,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"import xarray as xr\n",
"import os"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"- ### One file conversion (specifying a filename)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"We set some variables to get the **input netcdf_file_name**, its **path** and the **output csv_file_name ** and **path**."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": true,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"# Replace the file name in quote by the file name of the netcdf file (.nc) you want to convert to csv\n",
"netcdf_file_name = 'CMEMS_Yearly_Temperature_Dailymean_010_011_2000.nc'"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": true,
"slideshow": {
"slide_type": "-"
}
},
"outputs": [],
"source": [
"# Set variables names for the input file.nc (netcdf_file_in) and the output file.csv (`csv_file_out`)\n",
"netcdf_file_in = netcdf_dir + netcdf_file_name\n",
"csv_file_out = csv_dir + netcdf_file_name[:-3] + '.csv'"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"We **open** the netcdf file (using `open_dataset()` method), **convert** it to a dataframe (`to_dataframe()` method) and **write** this object to a **csv file** (`to_csv()` method)."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": true,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"ds = xr.open_dataset(netcdf_file_in)\n",
"df = ds.to_dataframe()\n",
"\n",
"df.to_csv(csv_file_out)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"scrolled": false,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"lat,lon,time,analysed_sst\r\n",
"39.6749992371,15.6750001907,2000-01-01 12:00:00,288.789993547\r\n",
"39.6749992371,15.6750001907,2000-01-02 12:00:00,288.759993548\r\n",
"39.6749992371,15.6750001907,2000-01-03 12:00:00,288.569993552\r\n",
"39.6749992371,15.6750001907,2000-01-04 12:00:00,288.589993551\r\n",
"39.6749992371,15.6750001907,2000-01-05 12:00:00,288.529993553\r\n",
"39.6749992371,15.6750001907,2000-01-06 12:00:00,288.549993552\r\n",
"39.6749992371,15.6750001907,2000-01-07 12:00:00,288.329993557\r\n",
"39.6749992371,15.6750001907,2000-01-08 12:00:00,288.299993558\r\n",
"39.6749992371,15.6750001907,2000-01-09 12:00:00,288.229993559\r\n"
]
}
],
"source": [
"!head $csv_file_out"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"\n",
"\n",
"### One File vs Directory conversion\n",
"<hr>\n",
"\n",
"It's great for one netcdf file, but what if we want to convert a **LIST OF FILES** located in a specific directory (e.g. raw-data) "
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"- ### Several files conversion (specifying a directory)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CMEMS_Yearly_Temperature_Dailymean_010_011_2000.nc\r\n",
"CMEMS_Yearly_Temperature_Dailymean_010_011_2001.nc\r\n",
"CMEMS_Yearly_Temperature_Dailymean_010_011_2002.nc\r\n",
"CMEMS_Yearly_Temperature_Dailymean_010_011_2003.nc\r\n",
"CMEMS_Yearly_Temperature_Dailymean_010_011_2004.nc\r\n"
]
}
],
"source": [
"ls $local_storage_directory/raw-data/"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"scrolled": false,
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"total 12M\r\n",
"-rw-rw-r-- 1 mercator mercator 2.4M Dec 2 14:36 CMEMS_Yearly_Temperature_Dailymean_010_011_2000.nc\r\n",
"-rw-rw-r-- 1 mercator mercator 2.4M Dec 2 14:37 CMEMS_Yearly_Temperature_Dailymean_010_011_2001.nc\r\n",
"-rw-rw-r-- 1 mercator mercator 2.4M Dec 2 14:37 CMEMS_Yearly_Temperature_Dailymean_010_011_2002.nc\r\n",
"-rw-rw-r-- 1 mercator mercator 2.4M Dec 2 14:37 CMEMS_Yearly_Temperature_Dailymean_010_011_2003.nc\r\n",
"-rw-rw-r-- 1 mercator mercator 2.4M Dec 2 14:37 CMEMS_Yearly_Temperature_Dailymean_010_011_2004.nc\r\n"
]
}
],
"source": [
"# Set a new directory path where the list of netcdf files (.nc) is stored.\n",
"# Path should end with a 'slash (/)'\n",
"files_to_convert = local_storage_directory + 'raw-data/'\n",
"!ls -lh $files_to_convert"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CMEMS_Yearly_Temperature_Dailymean_010_011_2000.nc has been processed to .csv\n",
"CMEMS_Yearly_Temperature_Dailymean_010_011_2001.nc has been processed to .csv\n",
"CMEMS_Yearly_Temperature_Dailymean_010_011_2002.nc has been processed to .csv\n",
"CMEMS_Yearly_Temperature_Dailymean_010_011_2003.nc has been processed to .csv\n",
"CMEMS_Yearly_Temperature_Dailymean_010_011_2004.nc has been processed to .csv\n"
]
}
],
"source": [
"# Set a loop to create a .csv file for each .nc file listed in `files_to_convert`\n",
"for filename in os.listdir(files_to_convert):\n",
" ds = xr.open_dataset(files_to_convert + filename)\n",
" df = ds.to_dataframe()\n",
" df.to_csv(csv_dir + filename[:-3] + '.csv')\n",
" print filename + ' has been processed to .csv'"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"total 317M\r\n",
"drwxrwxr-x 2 mercator mercator 6 Dec 2 23:03 monthly\r\n",
"drwxrwxr-x 2 mercator mercator 4.0K Dec 3 02:46 yearly\r\n",
"-rw-rw-r-- 1 mercator mercator 64M Dec 8 17:28 CMEMS_Yearly_Temperature_Dailymean_010_011_2000.csv\r\n",
"-rw-rw-r-- 1 mercator mercator 64M Dec 8 17:28 CMEMS_Yearly_Temperature_Dailymean_010_011_2001.csv\r\n",
"-rw-rw-r-- 1 mercator mercator 64M Dec 8 17:28 CMEMS_Yearly_Temperature_Dailymean_010_011_2002.csv\r\n",
"-rw-rw-r-- 1 mercator mercator 64M Dec 8 17:28 CMEMS_Yearly_Temperature_Dailymean_010_011_2003.csv\r\n",
"-rw-rw-r-- 1 mercator mercator 64M Dec 8 17:28 CMEMS_Yearly_Temperature_Dailymean_010_011_2004.csv\r\n"
]
}
],
"source": [
"!ls -lrth $csv_dir"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"scrolled": true,
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" analysed_sst\n",
"lat lon time \n",
"15.025 38.974998 2004-01-01 12:00:00 NaN\n",
" 2004-01-02 12:00:00 NaN\n",
" 2004-01-03 12:00:00 NaN\n",
" 2004-01-04 12:00:00 NaN\n",
" 2004-01-05 12:00:00 NaN\n",
" 2004-01-06 12:00:00 NaN\n",
" 2004-01-07 12:00:00 NaN\n",
" 2004-01-08 12:00:00 NaN\n",
" 2004-01-09 12:00:00 NaN\n",
" 2004-01-10 12:00:00 NaN\n",
" 2004-01-11 12:00:00 NaN\n",
" 2004-01-12 12:00:00 NaN\n",
" 2004-01-13 12:00:00 NaN\n",
" 2004-01-14 12:00:00 NaN\n",
" 2004-01-15 12:00:00 NaN\n",
" 2004-01-16 12:00:00 NaN\n",
" 2004-01-17 12:00:00 NaN\n",
" 2004-01-18 12:00:00 NaN\n",
" 2004-01-19 12:00:00 NaN\n",
" 2004-01-20 12:00:00 NaN\n",
" 2004-01-21 12:00:00 NaN\n",
" 2004-01-22 12:00:00 NaN\n",
" 2004-01-23 12:00:00 NaN\n",
" 2004-01-24 12:00:00 NaN\n",
" 2004-01-25 12:00:00 NaN\n",
" 2004-01-26 12:00:00 NaN\n",
" 2004-01-27 12:00:00 NaN\n",
" 2004-01-28 12:00:00 NaN\n",
" 2004-01-29 12:00:00 NaN\n",
" 2004-01-30 12:00:00 NaN\n",
"... ...\n",
"22.875 39.974998 2004-12-02 12:00:00 NaN\n",
" 2004-12-03 12:00:00 NaN\n",
" 2004-12-04 12:00:00 NaN\n",
" 2004-12-05 12:00:00 NaN\n",
" 2004-12-06 12:00:00 NaN\n",
" 2004-12-07 12:00:00 NaN\n",
" 2004-12-08 12:00:00 NaN\n",
" 2004-12-09 12:00:00 NaN\n",
" 2004-12-10 12:00:00 NaN\n",
" 2004-12-11 12:00:00 NaN\n",
" 2004-12-12 12:00:00 NaN\n",
" 2004-12-13 12:00:00 NaN\n",
" 2004-12-14 12:00:00 NaN\n",
" 2004-12-15 12:00:00 NaN\n",
" 2004-12-16 12:00:00 NaN\n",
" 2004-12-17 12:00:00 NaN\n",
" 2004-12-18 12:00:00 NaN\n",
" 2004-12-19 12:00:00 NaN\n",
" 2004-12-20 12:00:00 NaN\n",
" 2004-12-21 12:00:00 NaN\n",
" 2004-12-22 12:00:00 NaN\n",
" 2004-12-23 12:00:00 NaN\n",
" 2004-12-24 12:00:00 NaN\n",
" 2004-12-25 12:00:00 NaN\n",
" 2004-12-26 12:00:00 NaN\n",
" 2004-12-27 12:00:00 NaN\n",
" 2004-12-28 12:00:00 NaN\n",
" 2004-12-29 12:00:00 NaN\n",
" 2004-12-30 12:00:00 NaN\n",
" 2004-12-31 12:00:00 NaN\n",
"\n",
"[1214388 rows x 1 columns]\n"
]
}
],
"source": [
"print df"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### What do we see?\n",
"<hr>"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true,
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"**.csv files** we created contains:"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"- __4 columns__ for **Latitude (Lat)**, **Longitude (Lon)**, **Time (Date)** and **Sea Surface Temperature (SST)**"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"- __1 211 010 rows__ corresponding to the **value of the Ocean Sea Surface Temperature (SST)** for **each point (Lat, Lon)** of the bounding box from **January 2000** to **December 2004 (Time)**."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"- A large numbers of __Empty cells__ (grip point without SST values) are displayed as **'NaN'**"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### Polygon: Valid points located in the dataset grid\n",
"<hr>\n",
"\n",
"\n",
"- What if we want to get a **.csv file** containing SST values located on the **Ocean (exclusively)**, not Land **(without continents)**?\n",
"\n",
"\n",
"- **Important**: CSV reader softwares **may be limited** in the **maximum total number of rows** displayable (e.g. **microsoft excel** can natively display csv files containing **up to 1 048 576 rows**)."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Pandas usage to clean/minimize .csv files (e.g. without continent values)"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"collapsed": true,
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [],
"source": [
"import pandas as pd\n",
"import os"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"pandas module will be used on the previous created .csv files (e.g. '{{netcdf_file_name[:-3] + '.csv'}}'), which means that cells can be executed as is.\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"collapsed": true,
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [],
"source": [
"csv_uncleaned_in = netcdf_file_name[:-3] + '.csv'\n",
"csv_cleaned_out = csv_uncleaned_in[:-4] + '_cleaned.csv'"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"source": [
"The .csv file is loaded in a pandas dataframe with the `read_csv()` method."
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"collapsed": true,
"scrolled": true,
"slideshow": {
"slide_type": "-"
}
},
"outputs": [],
"source": [
"data = pd.read_csv(csv_dir + csv_uncleaned_in)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"Use of the **'dropna()'** function to remove rows including cells which do not contain SST values."
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"scrolled": true,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" lat lon time analysed_sst\n",
"12810 15.075 39.674999 2000-01-01 12:00:00 300.209993\n",
"12811 15.075 39.674999 2000-01-02 12:00:00 300.199993\n",
"12812 15.075 39.674999 2000-01-03 12:00:00 300.209993\n",
"12813 15.075 39.674999 2000-01-04 12:00:00 300.309993\n",
"12814 15.075 39.674999 2000-01-05 12:00:00 300.369993\n",
"12815 15.075 39.674999 2000-01-06 12:00:00 300.399993\n",
"12816 15.075 39.674999 2000-01-07 12:00:00 300.389993\n",
"12817 15.075 39.674999 2000-01-08 12:00:00 300.389993\n",
"12818 15.075 39.674999 2000-01-09 12:00:00 300.249993\n",
"12819 15.075 39.674999 2000-01-10 12:00:00 300.209993\n",
"12820 15.075 39.674999 2000-01-11 12:00:00 300.289993\n",
"12821 15.075 39.674999 2000-01-12 12:00:00 300.229993\n",
"12822 15.075 39.674999 2000-01-13 12:00:00 300.329993\n",
"12823 15.075 39.674999 2000-01-14 12:00:00 300.379993\n",
"12824 15.075 39.674999 2000-01-15 12:00:00 300.309993\n",
"12825 15.075 39.674999 2000-01-16 12:00:00 300.199993\n",
"12826 15.075 39.674999 2000-01-17 12:00:00 300.039993\n",
"12827 15.075 39.674999 2000-01-18 12:00:00 300.029993\n",
"12828 15.075 39.674999 2000-01-19 12:00:00 299.919993\n",
"12829 15.075 39.674999 2000-01-20 12:00:00 299.909993\n",
"12830 15.075 39.674999 2000-01-21 12:00:00 299.989993\n",
"12831 15.075 39.674999 2000-01-22 12:00:00 300.049993\n",
"12832 15.075 39.674999 2000-01-23 12:00:00 300.059993\n",
"12833 15.075 39.674999 2000-01-24 12:00:00 300.069993\n",
"12834 15.075 39.674999 2000-01-25 12:00:00 299.979993\n",
"12835 15.075 39.674999 2000-01-26 12:00:00 300.049993\n",
"12836 15.075 39.674999 2000-01-27 12:00:00 299.699993\n",
"12837 15.075 39.674999 2000-01-28 12:00:00 299.589993\n",
"12838 15.075 39.674999 2000-01-29 12:00:00 299.559993\n",
"12839 15.075 39.674999 2000-01-30 12:00:00 299.629993\n",
"... ... ... ... ...\n",
"1207038 22.875 38.974998 2000-12-02 12:00:00 300.369993\n",
"1207039 22.875 38.974998 2000-12-03 12:00:00 300.339993\n",
"1207040 22.875 38.974998 2000-12-04 12:00:00 300.389993\n",
"1207041 22.875 38.974998 2000-12-05 12:00:00 300.489993\n",
"1207042 22.875 38.974998 2000-12-06 12:00:00 300.519993\n",
"1207043 22.875 38.974998 2000-12-07 12:00:00 300.379993\n",
"1207044 22.875 38.974998 2000-12-08 12:00:00 300.259993\n",
"1207045 22.875 38.974998 2000-12-09 12:00:00 300.219993\n",
"1207046 22.875 38.974998 2000-12-10 12:00:00 300.039993\n",
"1207047 22.875 38.974998 2000-12-11 12:00:00 299.879993\n",
"1207048 22.875 38.974998 2000-12-12 12:00:00 299.559993\n",
"1207049 22.875 38.974998 2000-12-13 12:00:00 299.499993\n",
"1207050 22.875 38.974998 2000-12-14 12:00:00 299.409993\n",
"1207051 22.875 38.974998 2000-12-15 12:00:00 299.159993\n",
"1207052 22.875 38.974998 2000-12-16 12:00:00 299.159993\n",
"1207053 22.875 38.974998 2000-12-17 12:00:00 299.149993\n",
"1207054 22.875 38.974998 2000-12-18 12:00:00 299.159993\n",
"1207055 22.875 38.974998 2000-12-19 12:00:00 299.129993\n",
"1207056 22.875 38.974998 2000-12-20 12:00:00 299.089993\n",
"1207057 22.875 38.974998 2000-12-21 12:00:00 298.869993\n",
"1207058 22.875 38.974998 2000-12-22 12:00:00 298.859993\n",
"1207059 22.875 38.974998 2000-12-23 12:00:00 298.789993\n",
"1207060 22.875 38.974998 2000-12-24 12:00:00 298.699993\n",
"1207061 22.875 38.974998 2000-12-25 12:00:00 298.679993\n",
"1207062 22.875 38.974998 2000-12-26 12:00:00 298.699993\n",
"1207063 22.875 38.974998 2000-12-27 12:00:00 298.729993\n",
"1207064 22.875 38.974998 2000-12-28 12:00:00 298.689993\n",
"1207065 22.875 38.974998 2000-12-29 12:00:00 298.649993\n",
"1207066 22.875 38.974998 2000-12-30 12:00:00 298.659993\n",
"1207067 22.875 38.974998 2000-12-31 12:00:00 298.719993\n",
"\n",
"[798978 rows x 4 columns]\n"
]
}
],
"source": [
"print data.dropna()"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"**Write** (with `data.dropna()` method) the new clean dataframe to a output **.csv** file."
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"collapsed": true,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"data.dropna().to_csv(csv_dir + csv_cleaned_out, index = False)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## TO SUM UP\n",
"---\n",
"\n",
"## Convert NetCDF file(s) to CSV file(s) - with option convert K to C°"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"- ### containing only valid Ocean Sea Surface Temperature (in K but converted to C°)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"- ### for the bouding box 15, 22, 39, 40\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"- ### from January 2000 to December 2004"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# Variable name of the temperature you want to convert from Kelvin to Celcius°\n",
"# If no conversion is required, then set the value of 'var_name' to ''\n",
"# e.g. var_name = ''\n",
"var_name = 'analysed_sst'"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"scrolled": true,
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CMEMS_Yearly_Temperature_Dailymean_010_011_2000_cleaned_Cdegree.csv has been successfully processed - OK\n",
"CMEMS_Yearly_Temperature_Dailymean_010_011_2001_cleaned_Cdegree.csv has been successfully processed - OK\n",
"CMEMS_Yearly_Temperature_Dailymean_010_011_2002_cleaned_Cdegree.csv has been successfully processed - OK\n",
"CMEMS_Yearly_Temperature_Dailymean_010_011_2003_cleaned_Cdegree.csv has been successfully processed - OK\n",
"CMEMS_Yearly_Temperature_Dailymean_010_011_2004_cleaned_Cdegree.csv has been successfully processed - OK\n"
]
}
],
"source": [
"for filename in os.listdir(files_to_convert):\n",
" csv_cleaned_out = filename[:-3] + '_cleaned_Cdegree.csv'\n",
" ds = xr.open_dataset(files_to_convert+filename)\n",
" df = (ds.to_dataframe()).dropna()\n",
" if var_name:\n",
" df[var_name] = df[var_name].apply(lambda x: x - 273,15)\n",
" df.to_csv(csv_dir + csv_cleaned_out)\n",
" print csv_cleaned_out + ' has been successfully processed - OK'"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's check the content of '{{csv_cleaned_out}}'"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"lat,lon,time,analysed_sst\r\n",
"15.0749998093,39.6749992371,2004-01-01 12:00:00,27.4099932872\r\n",
"15.0749998093,39.6749992371,2004-01-02 12:00:00,27.149993293\r\n",
"15.0749998093,39.6749992371,2004-01-03 12:00:00,27.0899932943\r\n",
"15.0749998093,39.6749992371,2004-01-04 12:00:00,27.0699932948\r\n",
"15.0749998093,39.6749992371,2004-01-05 12:00:00,27.1599932928\r\n",
"15.0749998093,39.6749992371,2004-01-06 12:00:00,27.329993289\r\n",
"15.0749998093,39.6749992371,2004-01-07 12:00:00,27.3899932876\r\n",
"15.0749998093,39.6749992371,2004-01-08 12:00:00,27.4899932854\r\n",
"15.0749998093,39.6749992371,2004-01-09 12:00:00,27.5399932843\r\n"
]
}
],
"source": [
"!head $csv_dir/$csv_cleaned_out"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Header (5 first lines) of CMEMS_Yearly_Temperature_Dailymean_010_011_2000_cleaned.csv\n",
"\n"
]
},
{
"data": {
"text/html": [
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>lat</th>\n",
" <th>lon</th>\n",
" <th>time</th>\n",
" <th>analysed_sst</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>15.075</td>\n",
" <td>39.674999</td>\n",
" <td>2000-01-01 12:00:00</td>\n",
" <td>300.209993</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>15.075</td>\n",
" <td>39.674999</td>\n",
" <td>2000-01-02 12:00:00</td>\n",
" <td>300.199993</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>15.075</td>\n",
" <td>39.674999</td>\n",
" <td>2000-01-03 12:00:00</td>\n",
" <td>300.209993</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>15.075</td>\n",
" <td>39.674999</td>\n",
" <td>2000-01-04 12:00:00</td>\n",
" <td>300.309993</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>15.075</td>\n",
" <td>39.674999</td>\n",
" <td>2000-01-05 12:00:00</td>\n",
" <td>300.369993</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\n",
"\n",
"Header (5 first lines) of CMEMS_Yearly_Temperature_Dailymean_010_011_2000_cleaned_Cdegree.csv\n",
"\n"
]
},
{
"data": {
"text/html": [
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>lat</th>\n",
" <th>lon</th>\n",
" <th>time</th>\n",
" <th>analysed_sst</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>15.075</td>\n",
" <td>39.674999</td>\n",
" <td>2000-01-01 12:00:00</td>\n",
" <td>27.209993</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>15.075</td>\n",
" <td>39.674999</td>\n",
" <td>2000-01-02 12:00:00</td>\n",
" <td>27.199993</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>15.075</td>\n",
" <td>39.674999</td>\n",
" <td>2000-01-03 12:00:00</td>\n",
" <td>27.209993</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>15.075</td>\n",
" <td>39.674999</td>\n",
" <td>2000-01-04 12:00:00</td>\n",
" <td>27.309993</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>15.075</td>\n",
" <td>39.674999</td>\n",
" <td>2000-01-05 12:00:00</td>\n",
" <td>27.369993</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\n",
"\n",
"Header (5 first lines) of CMEMS_Yearly_Temperature_Dailymean_010_011_2001_cleaned_Cdegree.csv\n",
"\n"
]
},
{
"data": {
"text/html": [
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>lat</th>\n",
" <th>lon</th>\n",
" <th>time</th>\n",
" <th>analysed_sst</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>15.075</td>\n",
" <td>39.674999</td>\n",
" <td>2001-01-01 12:00:00</td>\n",
" <td>27.899993</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>15.075</td>\n",
" <td>39.674999</td>\n",
" <td>2001-01-02 12:00:00</td>\n",
" <td>27.869993</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>15.075</td>\n",
" <td>39.674999</td>\n",
" <td>2001-01-03 12:00:00</td>\n",
" <td>27.809993</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>15.075</td>\n",
" <td>39.674999</td>\n",
" <td>2001-01-04 12:00:00</td>\n",
" <td>27.769993</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>15.075</td>\n",
" <td>39.674999</td>\n",
" <td>2001-01-05 12:00:00</td>\n",
" <td>27.649993</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\n",
"\n",
"Header (5 first lines) of CMEMS_Yearly_Temperature_Dailymean_010_011_2002_cleaned_Cdegree.csv\n",
"\n"
]
},
{
"data": {
"text/html": [
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>lat</th>\n",
" <th>lon</th>\n",
" <th>time</th>\n",
" <th>analysed_sst</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>15.075</td>\n",
" <td>39.674999</td>\n",
" <td>2002-01-01 12:00:00</td>\n",
" <td>27.749993</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>15.075</td>\n",
" <td>39.674999</td>\n",
" <td>2002-01-02 12:00:00</td>\n",
" <td>27.299993</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>15.075</td>\n",
" <td>39.674999</td>\n",
" <td>2002-01-03 12:00:00</td>\n",
" <td>27.199993</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>15.075</td>\n",
" <td>39.674999</td>\n",
" <td>2002-01-04 12:00:00</td>\n",
" <td>27.239993</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>15.075</td>\n",
" <td>39.674999</td>\n",
" <td>2002-01-05 12:00:00</td>\n",
" <td>27.359993</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\n",
"\n",
"Header (5 first lines) of CMEMS_Yearly_Temperature_Dailymean_010_011_2003_cleaned_Cdegree.csv\n",
"\n"
]
},
{
"data": {
"text/html": [
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>lat</th>\n",
" <th>lon</th>\n",
" <th>time</th>\n",
" <th>analysed_sst</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>15.075</td>\n",
" <td>39.674999</td>\n",
" <td>2003-01-01 12:00:00</td>\n",
" <td>27.539993</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>15.075</td>\n",
" <td>39.674999</td>\n",
" <td>2003-01-02 12:00:00</td>\n",
" <td>27.629993</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>15.075</td>\n",
" <td>39.674999</td>\n",
" <td>2003-01-03 12:00:00</td>\n",
" <td>27.569993</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>15.075</td>\n",
" <td>39.674999</td>\n",
" <td>2003-01-04 12:00:00</td>\n",
" <td>27.499993</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>15.075</td>\n",
" <td>39.674999</td>\n",
" <td>2003-01-05 12:00:00</td>\n",
" <td>27.509993</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\n",
"\n",
"Header (5 first lines) of CMEMS_Yearly_Temperature_Dailymean_010_011_2004_cleaned_Cdegree.csv\n",
"\n"
]
},
{
"data": {
"text/html": [
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>lat</th>\n",
" <th>lon</th>\n",
" <th>time</th>\n",
" <th>analysed_sst</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>15.075</td>\n",
" <td>39.674999</td>\n",
" <td>2004-01-01 12:00:00</td>\n",
" <td>27.409993</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>15.075</td>\n",
" <td>39.674999</td>\n",
" <td>2004-01-02 12:00:00</td>\n",
" <td>27.149993</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>15.075</td>\n",
" <td>39.674999</td>\n",
" <td>2004-01-03 12:00:00</td>\n",
" <td>27.089993</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>15.075</td>\n",
" <td>39.674999</td>\n",
" <td>2004-01-04 12:00:00</td>\n",
" <td>27.069993</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>15.075</td>\n",
" <td>39.674999</td>\n",
" <td>2004-01-05 12:00:00</td>\n",
" <td>27.159993</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\n"
]
}
],
"source": [
"from IPython.display import display, HTML\n",
"\n",
"for filename in os.listdir(csv_dir):\n",
" if 'cleaned' in filename:\n",
" print '\\nHeader (5 first lines) of ' + filename + '\\n'\n",
" df = pd.read_csv(csv_dir + filename)\n",
" display(HTML((df.head(5)).to_html()))\n",
" print \"\\n\\n\""
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Manipulate [pandas](https://github.com/pydata/pandas) DataFrames within notebook - Qgrid"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"\n",
"\n",
"### __Important__\n",
"<hr>\n",
"\n",
"The **bash command (pip)** should be executed **only once** (to get the python module in the system), **not everytime you open this notebook**.\n",
"\n",
"<br>\n",
"If you **already have** this module, you can **go to** the **next slide**."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"%%bash\n",
"pip install qgrid"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"collapsed": true,
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [],
"source": [
"import qgrid"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {
"scrolled": false,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "e7b3cf4ab202464f8a143796763a7495",
"version_major": 2,
"version_minor": 0
},
"text/html": [
"<p>Failed to display Jupyter Widget of type <code>QgridWidget</code>.</p>\n",
"<p>\n",
" If you're reading this message in the Jupyter Notebook or JupyterLab Notebook, it may mean\n",
" that the widgets JavaScript is still loading. If this message persists, it\n",
" likely means that the widgets JavaScript library is either not installed or\n",
" not enabled. See the <a href=\"https://ipywidgets.readthedocs.io/en/stable/user_install.html\">Jupyter\n",
" Widgets Documentation</a> for setup instructions.\n",
"</p>\n",
"<p>\n",
" If you're reading this message in another frontend (for example, a static\n",
" rendering on GitHub or <a href=\"https://nbviewer.jupyter.org/\">NBViewer</a>),\n",
" it may mean that your frontend doesn't currently support widgets.\n",
"</p>\n"
],
"text/plain": [
"QgridWidget(grid_options={'enableTextSelectionOnCells': True, 'editable': True, 'explicitInitialization': True, 'minVisibleRows': 8, 'enableColumnReorder': False, 'defaultColumnWidth': 150, 'maxVisibleRows': 15, 'fullWidthRows': True, 'rowHeight': 28, 'autoEdit': False, 'syncColumnCellResize': True, 'forceFitColumns': True}, precision=5)"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"qgrid.show_grid(df)"
]
}
],
"metadata": {
"celltoolbar": "Slideshow",
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.13"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment