Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save copernicusmarinegist/b57417225d0d4ea47c5d6200f9d8cac3 to your computer and use it in GitHub Desktop.
Save copernicusmarinegist/b57417225d0d4ea47c5d6200f9d8cac3 to your computer and use it in GitHub Desktop.
Python Notebook Example on how to convert a netcdf file to csv file
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Data conversion (from NetCDF to CSV)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In this chapter, we'll see **how to convert** datapoints of several years of Ocean Sea Surface Temperature from **netCDF format (.nc)** to **tabular format (.csv)**.\n",
"\n",
"Some users are also interested in converting Kelvin Temperature to Celcius Degree, that is why there will be an example at the end.\n",
"\n",
"#### Tip\n",
"<hr>\n",
"- To get the **netCDF files** on your local hard drive, follow the [previous chapter](http://localhost:8888/notebooks/sf_Temp/copernicus-marine-motu-fetcher.ipynb#Install-Copernicus-Marine-Environment).\n",
"\n",
"\n",
"- If you aren’t familiar with **this data format**, the [netCDF FAQ](http://www.unidata.ucar.edu/software/netcdf/docs/faq.html#What-Is-netCDF) is a good place to start. \n",
"\n",
"\n",
"- A HowTo on how to **manipulate** netCDF files content with **Python** is available [here]()."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Summary\n",
"\n",
"- #### working directories setup\n",
"- #### xarray install (via pip)\n",
"- #### xarray usage to convert netcdf to csv\n",
" - #### One-file-conversion-(specifying-a-filename)\n",
" - #### Several-files-conversion-(specifying-a-directory)\n",
"- #### pandas usage to clean/minize csv (without continent values)\n",
"- #### manipulate [pandas](https://github.com/pydata/pandas) DataFrames within notebook - [Qgrid](http://qgrid.readthedocs.io/en/widget-guidelines/#)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## working directories setup"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"csv netcdf raw-data\r\n"
]
}
],
"source": [
"# List the content of the directory including Copernicus Marine Data (.nc, .png, .geotiff, .xml, pdf etc...)\n",
"!ls /home/mercator/copernicus-marine-data/"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": true,
"slideshow": {
"slide_type": "-"
}
},
"outputs": [],
"source": [
"# Replace 'local_storage_directory', 'netcdf_dir' and 'csv_dir' by respectively\n",
"# the directory path to Copernicus Marine data, the directory path to netcdf files\n",
"# and the directory path to csv files\n",
"local_storage_directory = '/home/mercator/copernicus-marine-data/'\n",
"netcdf_dir = local_storage_directory + 'netcdf/'\n",
"csv_dir = local_storage_directory + 'csv/'"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## xarray install (via pip)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"\n",
"\n",
"#### __Important__\n",
"<hr>\n",
"\n",
"The **bash command (pip)** should be executed **only once** (to get/download the python module in the system), **not everytime you open this notebook**.\n",
"\n",
"<br>\n",
"If you **already have** this module, you can **go to** the **next slide** (simply import it)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"%%bash\n",
"pip install xarray"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## xarray usage to convert netcdf to csv"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"Import module(s) should be executed only once per kernel execution."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": true,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"import xarray as xr\n",
"import os"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"- ### One file conversion (specifying a filename)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"We set some variables to get the **input netcdf_file_name**, its **path** and the **output csv_file_name ** and **path**."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": true,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"# Replace the file name in quote by the file name of the netcdf file (.nc) you want to convert to csv\n",
"netcdf_file_name = 'CMEMS_Yearly_Temperature_Dailymean_010_011_2000.nc'"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": true,
"slideshow": {
"slide_type": "-"
}
},
"outputs": [],
"source": [
"# Set variables names for the input file.nc (netcdf_file_in) and the output file.csv (`csv_file_out`)\n",
"netcdf_file_in = netcdf_dir + netcdf_file_name\n",
"csv_file_out = csv_dir + netcdf_file_name[:-3] + '.csv'"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"We **open** the netcdf file (using `open_dataset()` method), **convert** it to a dataframe (`to_dataframe()` method) and **write** this object to a **csv file** (`to_csv()` method)."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": true,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"ds = xr.open_dataset(netcdf_file_in)\n",
"df = ds.to_dataframe()\n",
"\n",
"df.to_csv(csv_file_out)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"scrolled": false,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"lat,lon,time,analysed_sst\r\n",
"39.6749992371,15.6750001907,2000-01-01 12:00:00,288.789993547\r\n",
"39.6749992371,15.6750001907,2000-01-02 12:00:00,288.759993548\r\n",
"39.6749992371,15.6750001907,2000-01-03 12:00:00,288.569993552\r\n",
"39.6749992371,15.6750001907,2000-01-04 12:00:00,288.589993551\r\n",
"39.6749992371,15.6750001907,2000-01-05 12:00:00,288.529993553\r\n",
"39.6749992371,15.6750001907,2000-01-06 12:00:00,288.549993552\r\n",
"39.6749992371,15.6750001907,2000-01-07 12:00:00,288.329993557\r\n",
"39.6749992371,15.6750001907,2000-01-08 12:00:00,288.299993558\r\n",
"39.6749992371,15.6750001907,2000-01-09 12:00:00,288.229993559\r\n"
]
}
],
"source": [
"!head $csv_file_out"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"\n",
"\n",
"### One File vs Directory conversion\n",
"<hr>\n",
"\n",
"It's great for one netcdf file, but what if we want to convert a **LIST OF FILES** located in a specific directory (e.g. raw-data) "
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"- ### Several files conversion (specifying a directory)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CMEMS_Yearly_Temperature_Dailymean_010_011_2000.nc\r\n",
"CMEMS_Yearly_Temperature_Dailymean_010_011_2001.nc\r\n",
"CMEMS_Yearly_Temperature_Dailymean_010_011_2002.nc\r\n",
"CMEMS_Yearly_Temperature_Dailymean_010_011_2003.nc\r\n",
"CMEMS_Yearly_Temperature_Dailymean_010_011_2004.nc\r\n"
]
}
],
"source": [
"ls $local_storage_directory/raw-data/"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"scrolled": false,
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"total 12M\r\n",
"-rw-rw-r-- 1 mercator mercator 2.4M Dec 2 14:36 CMEMS_Yearly_Temperature_Dailymean_010_011_2000.nc\r\n",
"-rw-rw-r-- 1 mercator mercator 2.4M Dec 2 14:37 CMEMS_Yearly_Temperature_Dailymean_010_011_2001.nc\r\n",
"-rw-rw-r-- 1 mercator mercator 2.4M Dec 2 14:37 CMEMS_Yearly_Temperature_Dailymean_010_011_2002.nc\r\n",
"-rw-rw-r-- 1 mercator mercator 2.4M Dec 2 14:37 CMEMS_Yearly_Temperature_Dailymean_010_011_2003.nc\r\n",
"-rw-rw-r-- 1 mercator mercator 2.4M Dec 2 14:37 CMEMS_Yearly_Temperature_Dailymean_010_011_2004.nc\r\n"
]
}
],
"source": [
"# Set a new directory path where the list of netcdf files (.nc) is stored.\n",
"# Path should end with a 'slash (/)'\n",
"files_to_convert = local_storage_directory + 'raw-data/'\n",
"!ls -lh $files_to_convert"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CMEMS_Yearly_Temperature_Dailymean_010_011_2000.nc has been processed to .csv\n",
"CMEMS_Yearly_Temperature_Dailymean_010_011_2001.nc has been processed to .csv\n",
"CMEMS_Yearly_Temperature_Dailymean_010_011_2002.nc has been processed to .csv\n",
"CMEMS_Yearly_Temperature_Dailymean_010_011_2003.nc has been processed to .csv\n",
"CMEMS_Yearly_Temperature_Dailymean_010_011_2004.nc has been processed to .csv\n"
]
}
],
"source": [
"# Set a loop to create a .csv file for each .nc file listed in `files_to_convert`\n",
"for filename in os.listdir(files_to_convert):\n",
" ds = xr.open_dataset(files_to_convert + filename)\n",
" df = ds.to_dataframe()\n",
" df.to_csv(csv_dir + filename[:-3] + '.csv')\n",
" print filename + ' has been processed to .csv'"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"total 317M\r\n",
"drwxrwxr-x 2 mercator mercator 6 Dec 2 23:03 monthly\r\n",
"drwxrwxr-x 2 mercator mercator 4.0K Dec 3 02:46 yearly\r\n",
"-rw-rw-r-- 1 mercator mercator 64M Dec 8 17:28 CMEMS_Yearly_Temperature_Dailymean_010_011_2000.csv\r\n",
"-rw-rw-r-- 1 mercator mercator 64M Dec 8 17:28 CMEMS_Yearly_Temperature_Dailymean_010_011_2001.csv\r\n",
"-rw-rw-r-- 1 mercator mercator 64M Dec 8 17:28 CMEMS_Yearly_Temperature_Dailymean_010_011_2002.csv\r\n",
"-rw-rw-r-- 1 mercator mercator 64M Dec 8 17:28 CMEMS_Yearly_Temperature_Dailymean_010_011_2003.csv\r\n",
"-rw-rw-r-- 1 mercator mercator 64M Dec 8 17:28 CMEMS_Yearly_Temperature_Dailymean_010_011_2004.csv\r\n"
]
}
],
"source": [
"!ls -lrth $csv_dir"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"scrolled": true,
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" analysed_sst\n",
"lat lon time \n",
"15.025 38.974998 2004-01-01 12:00:00 NaN\n",
" 2004-01-02 12:00:00 NaN\n",
" 2004-01-03 12:00:00 NaN\n",
" 2004-01-04 12:00:00 NaN\n",
" 2004-01-05 12:00:00 NaN\n",
" 2004-01-06 12:00:00 NaN\n",
" 2004-01-07 12:00:00 NaN\n",
" 2004-01-08 12:00:00 NaN\n",
" 2004-01-09 12:00:00 NaN\n",
" 2004-01-10 12:00:00 NaN\n",
" 2004-01-11 12:00:00 NaN\n",
" 2004-01-12 12:00:00 NaN\n",
" 2004-01-13 12:00:00 NaN\n",
" 2004-01-14 12:00:00 NaN\n",
" 2004-01-15 12:00:00 NaN\n",
" 2004-01-16 12:00:00 NaN\n",
" 2004-01-17 12:00:00 NaN\n",
" 2004-01-18 12:00:00 NaN\n",
" 2004-01-19 12:00:00 NaN\n",
" 2004-01-20 12:00:00 NaN\n",
" 2004-01-21 12:00:00 NaN\n",
" 2004-01-22 12:00:00 NaN\n",
" 2004-01-23 12:00:00 NaN\n",
" 2004-01-24 12:00:00 NaN\n",
" 2004-01-25 12:00:00 NaN\n",
" 2004-01-26 12:00:00 NaN\n",
" 2004-01-27 12:00:00 NaN\n",
" 2004-01-28 12:00:00 NaN\n",
" 2004-01-29 12:00:00 NaN\n",
" 2004-01-30 12:00:00 NaN\n",
"... ...\n",
"22.875 39.974998 2004-12-02 12:00:00 NaN\n",
" 2004-12-03 12:00:00 NaN\n",
" 2004-12-04 12:00:00 NaN\n",
" 2004-12-05 12:00:00 NaN\n",
" 2004-12-06 12:00:00 NaN\n",
" 2004-12-07 12:00:00 NaN\n",
" 2004-12-08 12:00:00 NaN\n",
" 2004-12-09 12:00:00 NaN\n",
" 2004-12-10 12:00:00 NaN\n",
" 2004-12-11 12:00:00 NaN\n",
" 2004-12-12 12:00:00 NaN\n",
" 2004-12-13 12:00:00 NaN\n",
" 2004-12-14 12:00:00 NaN\n",
" 2004-12-15 12:00:00 NaN\n",
" 2004-12-16 12:00:00 NaN\n",
" 2004-12-17 12:00:00 NaN\n",
" 2004-12-18 12:00:00 NaN\n",
" 2004-12-19 12:00:00 NaN\n",
" 2004-12-20 12:00:00 NaN\n",
" 2004-12-21 12:00:00 NaN\n",
" 2004-12-22 12:00:00 NaN\n",
" 2004-12-23 12:00:00 NaN\n",
" 2004-12-24 12:00:00 NaN\n",
" 2004-12-25 12:00:00 NaN\n",
" 2004-12-26 12:00:00 NaN\n",
" 2004-12-27 12:00:00 NaN\n",
" 2004-12-28 12:00:00 NaN\n",
" 2004-12-29 12:00:00 NaN\n",
" 2004-12-30 12:00:00 NaN\n",
" 2004-12-31 12:00:00 NaN\n",
"\n",
"[1214388 rows x 1 columns]\n"
]
}
],
"source": [
"print df"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### What do we see?\n",
"<hr>"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true,
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"**.csv files** we created contains:"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"- __4 columns__ for **Latitude (Lat)**, **Longitude (Lon)**, **Time (Date)** and **Sea Surface Temperature (SST)**"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"- __1 211 010 rows__ corresponding to the **value of the Ocean Sea Surface Temperature (SST)** for **each point (Lat, Lon)** of the bounding box from **January 2000** to **December 2004 (Time)**."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"- A large numbers of __Empty cells__ (grip point without SST values) are displayed as **'NaN'**"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### Polygon: Valid points located in the dataset grid\n",
"<hr>\n",
"\n",
"\n",
"- What if we want to get a **.csv file** containing SST values located on the **Ocean (exclusively)**, not Land **(without continents)**?\n",
"\n",
"\n",
"- **Important**: CSV reader softwares **may be limited** in the **maximum total number of rows** displayable (e.g. **microsoft excel** can natively display csv files containing **up to 1 048 576 rows**)."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Pandas usage to clean/minimize .csv files (e.g. without continent values)"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"collapsed": true,
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [],
"source": [
"import pandas as pd\n",
"import os"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"pandas module will be used on the previous created .csv files (e.g. '{{netcdf_file_name[:-3] + '.csv'}}'), which means that cells can be executed as is.\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"collapsed": true,
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [],
"source": [
"csv_uncleaned_in = netcdf_file_name[:-3] + '.csv'\n",
"csv_cleaned_out = csv_uncleaned_in[:-4] + '_cleaned.csv'"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"source": [
"The .csv file is loaded in a pandas dataframe with the `read_csv()` method."
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"collapsed": true,
"scrolled": true,
"slideshow": {
"slide_type": "-"
}
},
"outputs": [],
"source": [
"data = pd.read_csv(csv_dir + csv_uncleaned_in)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"Use of the **'dropna()'** function to remove rows including cells which do not contain SST values."
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"scrolled": true,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" lat lon time analysed_sst\n",
"12810 15.075 39.674999 2000-01-01 12:00:00 300.209993\n",
"12811 15.075 39.674999 2000-01-02 12:00:00 300.199993\n",
"12812 15.075 39.674999 2000-01-03 12:00:00 300.209993\n",
"12813 15.075 39.674999 2000-01-04 12:00:00 300.309993\n",
"12814 15.075 39.674999 2000-01-05 12:00:00 300.369993\n",
"12815 15.075 39.674999 2000-01-06 12:00:00 300.399993\n",
"12816 15.075 39.674999 2000-01-07 12:00:00 300.389993\n",
"12817 15.075 39.674999 2000-01-08 12:00:00 300.389993\n",
"12818 15.075 39.674999 2000-01-09 12:00:00 300.249993\n",
"12819 15.075 39.674999 2000-01-10 12:00:00 300.209993\n",
"12820 15.075 39.674999 2000-01-11 12:00:00 300.289993\n",
"12821 15.075 39.674999 2000-01-12 12:00:00 300.229993\n",
"12822 15.075 39.674999 2000-01-13 12:00:00 300.329993\n",
"12823 15.075 39.674999 2000-01-14 12:00:00 300.379993\n",
"12824 15.075 39.674999 2000-01-15 12:00:00 300.309993\n",
"12825 15.075 39.674999 2000-01-16 12:00:00 300.199993\n",
"12826 15.075 39.674999 2000-01-17 12:00:00 300.039993\n",
"12827 15.075 39.674999 2000-01-18 12:00:00 300.029993\n",
"12828 15.075 39.674999 2000-01-19 12:00:00 299.919993\n",
"12829 15.075 39.674999 2000-01-20 12:00:00 299.909993\n",
"12830 15.075 39.674999 2000-01-21 12:00:00 299.989993\n",
"12831 15.075 39.674999 2000-01-22 12:00:00 300.049993\n",
"12832 15.075 39.674999 2000-01-23 12:00:00 300.059993\n",
"12833 15.075 39.674999 2000-01-24 12:00:00 300.069993\n",
"12834 15.075 39.674999 2000-01-25 12:00:00 299.979993\n",
"12835 15.075 39.674999 2000-01-26 12:00:00 300.049993\n",
"12836 15.075 39.674999 2000-01-27 12:00:00 299.699993\n",
"12837 15.075 39.674999 2000-01-28 12:00:00 299.589993\n",
"12838 15.075 39.674999 2000-01-29 12:00:00 299.559993\n",
"12839 15.075 39.674999 2000-01-30 12:00:00 299.629993\n",
"... ... ... ... ...\n",
"1207038 22.875 38.974998 2000-12-02 12:00:00 300.369993\n",
"1207039 22.875 38.974998 2000-12-03 12:00:00 300.339993\n",
"1207040 22.875 38.974998 2000-12-04 12:00:00 300.389993\n",
"1207041 22.875 38.974998 2000-12-05 12:00:00 300.489993\n",
"1207042 22.875 38.974998 2000-12-06 12:00:00 300.519993\n",
"1207043 22.875 38.974998 2000-12-07 12:00:00 300.379993\n",
"1207044 22.875 38.974998 2000-12-08 12:00:00 300.259993\n",
"1207045 22.875 38.974998 2000-12-09 12:00:00 300.219993\n",
"1207046 22.875 38.974998 2000-12-10 12:00:00 300.039993\n",
"1207047 22.875 38.974998 2000-12-11 12:00:00 299.879993\n",
"1207048 22.875 38.974998 2000-12-12 12:00:00 299.559993\n",
"1207049 22.875 38.974998 2000-12-13 12:00:00 299.499993\n",
"1207050 22.875 38.974998 2000-12-14 12:00:00 299.409993\n",
"1207051 22.875 38.974998 2000-12-15 12:00:00 299.159993\n",
"1207052 22.875 38.974998 2000-12-16 12:00:00 299.159993\n",
"1207053 22.875 38.974998 2000-12-17 12:00:00 299.149993\n",
"1207054 22.875 38.974998 2000-12-18 12:00:00 299.159993\n",
"1207055 22.875 38.974998 2000-12-19 12:00:00 299.129993\n",
"1207056 22.875 38.974998 2000-12-20 12:00:00 299.089993\n",
"1207057 22.875 38.974998 2000-12-21 12:00:00 298.869993\n",
"1207058 22.875 38.974998 2000-12-22 12:00:00 298.859993\n",
"1207059 22.875 38.974998 2000-12-23 12:00:00 298.789993\n",
"1207060 22.875 38.974998 2000-12-24 12:00:00 298.699993\n",
"1207061 22.875 38.974998 2000-12-25 12:00:00 298.679993\n",
"1207062 22.875 38.974998 2000-12-26 12:00:00 298.699993\n",
"1207063 22.875 38.974998 2000-12-27 12:00:00 298.729993\n",
"1207064 22.875 38.974998 2000-12-28 12:00:00 298.689993\n",
"1207065 22.875 38.974998 2000-12-29 12:00:00 298.649993\n",
"1207066 22.875 38.974998 2000-12-30 12:00:00 298.659993\n",
"1207067 22.875 38.974998 2000-12-31 12:00:00 298.719993\n",
"\n",
"[798978 rows x 4 columns]\n"
]
}
],
"source": [
"print data.dropna()"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"**Write** (with `data.dropna()` method) the new clean dataframe to a output **.csv** file."
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"collapsed": true,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"data.dropna().to_csv(csv_dir + csv_cleaned_out, index = False)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## TO SUM UP\n",
"---\n",
"\n",
"## Convert NetCDF file(s) to CSV file(s) - with option convert K to C°"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"- ### containing only valid Ocean Sea Surface Temperature (in K but converted to C°)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"- ### for the bouding box 15, 22, 39, 40\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"- ### from January 2000 to December 2004"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# Variable name of the temperature you want to convert from Kelvin to Celcius°\n",
"# If no conversion is required, then set the value of 'var_name' to ''\n",
"# e.g. var_name = ''\n",
"var_name = 'analysed_sst'"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"scrolled": true,
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CMEMS_Yearly_Temperature_Dailymean_010_011_2000_cleaned_Cdegree.csv has been successfully processed - OK\n",
"CMEMS_Yearly_Temperature_Dailymean_010_011_2001_cleaned_Cdegree.csv has been successfully processed - OK\n",
"CMEMS_Yearly_Temperature_Dailymean_010_011_2002_cleaned_Cdegree.csv has been successfully processed - OK\n",
"CMEMS_Yearly_Temperature_Dailymean_010_011_2003_cleaned_Cdegree.csv has been successfully processed - OK\n",
"CMEMS_Yearly_Temperature_Dailymean_010_011_2004_cleaned_Cdegree.csv has been successfully processed - OK\n"
]
}
],
"source": [
"for filename in os.listdir(files_to_convert):\n",
" csv_cleaned_out = filename[:-3] + '_cleaned_Cdegree.csv'\n",
" ds = xr.open_dataset(files_to_convert+filename)\n",
" df = (ds.to_dataframe()).dropna()\n",
" if var_name:\n",
" df[var_name] = df[var_name].apply(lambda x: x - 273,15)\n",
" df.to_csv(csv_dir + csv_cleaned_out)\n",
" print csv_cleaned_out + ' has been successfully processed - OK'"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's check the content of '{{csv_cleaned_out}}'"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"lat,lon,time,analysed_sst\r\n",
"15.0749998093,39.6749992371,2004-01-01 12:00:00,27.4099932872\r\n",
"15.0749998093,39.6749992371,2004-01-02 12:00:00,27.149993293\r\n",
"15.0749998093,39.6749992371,2004-01-03 12:00:00,27.0899932943\r\n",
"15.0749998093,39.6749992371,2004-01-04 12:00:00,27.0699932948\r\n",
"15.0749998093,39.6749992371,2004-01-05 12:00:00,27.1599932928\r\n",
"15.0749998093,39.6749992371,2004-01-06 12:00:00,27.329993289\r\n",
"15.0749998093,39.6749992371,2004-01-07 12:00:00,27.3899932876\r\n",
"15.0749998093,39.6749992371,2004-01-08 12:00:00,27.4899932854\r\n",
"15.0749998093,39.6749992371,2004-01-09 12:00:00,27.5399932843\r\n"
]
}
],
"source": [
"!head $csv_dir/$csv_cleaned_out"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Header (5 first lines) of CMEMS_Yearly_Temperature_Dailymean_010_011_2000_cleaned.csv\n",
"\n"
]
},
{
"data": {
"text/html": [
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>lat</th>\n",
" <th>lon</th>\n",
" <th>time</th>\n",
" <th>analysed_sst</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>15.075</td>\n",
" <td>39.674999</td>\n",
" <td>2000-01-01 12:00:00</td>\n",
" <td>300.209993</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>15.075</td>\n",
" <td>39.674999</td>\n",
" <td>2000-01-02 12:00:00</td>\n",
" <td>300.199993</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>15.075</td>\n",
" <td>39.674999</td>\n",
" <td>2000-01-03 12:00:00</td>\n",
" <td>300.209993</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>15.075</td>\n",
" <td>39.674999</td>\n",
" <td>2000-01-04 12:00:00</td>\n",
" <td>300.309993</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>15.075</td>\n",
" <td>39.674999</td>\n",
" <td>2000-01-05 12:00:00</td>\n",
" <td>300.369993</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\n",
"\n",
"Header (5 first lines) of CMEMS_Yearly_Temperature_Dailymean_010_011_2000_cleaned_Cdegree.csv\n",
"\n"
]
},
{
"data": {
"text/html": [
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>lat</th>\n",
" <th>lon</th>\n",
" <th>time</th>\n",
" <th>analysed_sst</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>15.075</td>\n",
" <td>39.674999</td>\n",
" <td>2000-01-01 12:00:00</td>\n",
" <td>27.209993</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>15.075</td>\n",
" <td>39.674999</td>\n",
" <td>2000-01-02 12:00:00</td>\n",
" <td>27.199993</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>15.075</td>\n",
" <td>39.674999</td>\n",
" <td>2000-01-03 12:00:00</td>\n",
" <td>27.209993</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>15.075</td>\n",
" <td>39.674999</td>\n",
" <td>2000-01-04 12:00:00</td>\n",
" <td>27.309993</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>15.075</td>\n",
" <td>39.674999</td>\n",
" <td>2000-01-05 12:00:00</td>\n",
" <td>27.369993</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\n",
"\n",
"Header (5 first lines) of CMEMS_Yearly_Temperature_Dailymean_010_011_2001_cleaned_Cdegree.csv\n",
"\n"
]
},
{
"data": {
"text/html": [
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>lat</th>\n",
" <th>lon</th>\n",
" <th>time</th>\n",
" <th>analysed_sst</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>15.075</td>\n",
" <td>39.674999</td>\n",
" <td>2001-01-01 12:00:00</td>\n",
" <td>27.899993</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>15.075</td>\n",
" <td>39.674999</td>\n",
" <td>2001-01-02 12:00:00</td>\n",
" <td>27.869993</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>15.075</td>\n",
" <td>39.674999</td>\n",
" <td>2001-01-03 12:00:00</td>\n",
" <td>27.809993</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>15.075</td>\n",
" <td>39.674999</td>\n",
" <td>2001-01-04 12:00:00</td>\n",
" <td>27.769993</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>15.075</td>\n",
" <td>39.674999</td>\n",
" <td>2001-01-05 12:00:00</td>\n",
" <td>27.649993</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\n",
"\n",
"Header (5 first lines) of CMEMS_Yearly_Temperature_Dailymean_010_011_2002_cleaned_Cdegree.csv\n",
"\n"
]
},
{
"data": {
"text/html": [
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>lat</th>\n",
" <th>lon</th>\n",
" <th>time</th>\n",
" <th>analysed_sst</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>15.075</td>\n",
" <td>39.674999</td>\n",
" <td>2002-01-01 12:00:00</td>\n",
" <td>27.749993</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>15.075</td>\n",
" <td>39.674999</td>\n",
" <td>2002-01-02 12:00:00</td>\n",
" <td>27.299993</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>15.075</td>\n",
" <td>39.674999</td>\n",
" <td>2002-01-03 12:00:00</td>\n",
" <td>27.199993</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>15.075</td>\n",
" <td>39.674999</td>\n",
" <td>2002-01-04 12:00:00</td>\n",
" <td>27.239993</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>15.075</td>\n",
" <td>39.674999</td>\n",
" <td>2002-01-05 12:00:00</td>\n",
" <td>27.359993</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\n",
"\n",
"Header (5 first lines) of CMEMS_Yearly_Temperature_Dailymean_010_011_2003_cleaned_Cdegree.csv\n",
"\n"
]
},
{
"data": {
"text/html": [
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>lat</th>\n",
" <th>lon</th>\n",
" <th>time</th>\n",
" <th>analysed_sst</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>15.075</td>\n",
" <td>39.674999</td>\n",
" <td>2003-01-01 12:00:00</td>\n",
" <td>27.539993</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>15.075</td>\n",
" <td>39.674999</td>\n",
" <td>2003-01-02 12:00:00</td>\n",
" <td>27.629993</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>15.075</td>\n",
" <td>39.674999</td>\n",
" <td>2003-01-03 12:00:00</td>\n",
" <td>27.569993</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>15.075</td>\n",
" <td>39.674999</td>\n",
" <td>2003-01-04 12:00:00</td>\n",
" <td>27.499993</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>15.075</td>\n",
" <td>39.674999</td>\n",
" <td>2003-01-05 12:00:00</td>\n",
" <td>27.509993</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\n",
"\n",
"Header (5 first lines) of CMEMS_Yearly_Temperature_Dailymean_010_011_2004_cleaned_Cdegree.csv\n",
"\n"
]
},
{
"data": {
"text/html": [
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>lat</th>\n",
" <th>lon</th>\n",
" <th>time</th>\n",
" <th>analysed_sst</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>15.075</td>\n",
" <td>39.674999</td>\n",
" <td>2004-01-01 12:00:00</td>\n",
" <td>27.409993</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>15.075</td>\n",
" <td>39.674999</td>\n",
" <td>2004-01-02 12:00:00</td>\n",
" <td>27.149993</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>15.075</td>\n",
" <td>39.674999</td>\n",
" <td>2004-01-03 12:00:00</td>\n",
" <td>27.089993</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>15.075</td>\n",
" <td>39.674999</td>\n",
" <td>2004-01-04 12:00:00</td>\n",
" <td>27.069993</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>15.075</td>\n",
" <td>39.674999</td>\n",
" <td>2004-01-05 12:00:00</td>\n",
" <td>27.159993</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\n"
]
}
],
"source": [
"from IPython.display import display, HTML\n",
"\n",
"for filename in os.listdir(csv_dir):\n",
" if 'cleaned' in filename:\n",
" print '\\nHeader (5 first lines) of ' + filename + '\\n'\n",
" df = pd.read_csv(csv_dir + filename)\n",
" display(HTML((df.head(5)).to_html()))\n",
" print \"\\n\\n\""
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Manipulate [pandas](https://github.com/pydata/pandas) DataFrames within notebook - Qgrid"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"\n",
"\n",
"### __Important__\n",
"<hr>\n",
"\n",
"The **bash command (pip)** should be executed **only once** (to get the python module in the system), **not everytime you open this notebook**.\n",
"\n",
"<br>\n",
"If you **already have** this module, you can **go to** the **next slide**."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"%%bash\n",
"pip install qgrid"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"collapsed": true,
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [],
"source": [
"import qgrid"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {
"scrolled": false,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "e7b3cf4ab202464f8a143796763a7495",
"version_major": 2,
"version_minor": 0
},
"text/html": [
"<p>Failed to display Jupyter Widget of type <code>QgridWidget</code>.</p>\n",
"<p>\n",
" If you're reading this message in the Jupyter Notebook or JupyterLab Notebook, it may mean\n",
" that the widgets JavaScript is still loading. If this message persists, it\n",
" likely means that the widgets JavaScript library is either not installed or\n",
" not enabled. See the <a href=\"https://ipywidgets.readthedocs.io/en/stable/user_install.html\">Jupyter\n",
" Widgets Documentation</a> for setup instructions.\n",
"</p>\n",
"<p>\n",
" If you're reading this message in another frontend (for example, a static\n",
" rendering on GitHub or <a href=\"https://nbviewer.jupyter.org/\">NBViewer</a>),\n",
" it may mean that your frontend doesn't currently support widgets.\n",
"</p>\n"
],
"text/plain": [
"QgridWidget(grid_options={'enableTextSelectionOnCells': True, 'editable': True, 'explicitInitialization': True, 'minVisibleRows': 8, 'enableColumnReorder': False, 'defaultColumnWidth': 150, 'maxVisibleRows': 15, 'fullWidthRows': True, 'rowHeight': 28, 'autoEdit': False, 'syncColumnCellResize': True, 'forceFitColumns': True}, precision=5)"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"qgrid.show_grid(df)"
]
}
],
"metadata": {
"celltoolbar": "Slideshow",
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.13"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
@saratiedt
Copy link

Hi, I'm looking for a dataset just like your to use on my project, can you shere the link with me, i will appreciated.

@copernicusmarine
Copy link

Hi @saratiedt! The data access for the dataset used in this (very old) tutorial is available from the official dataset landing page: https://resources.marine.copernicus.eu/?option=com_csw&view=details&product_id=SST_GLO_SST_L4_NRT_OBSERVATIONS_010_001

Feel free to check our HelpCenter out for self-service support!

Copernicus Pro Tips: Would you need any assistance, please contact our human user support team using the widget chat on the bottom right of the page!

Thanks,
D.

@mara24na
Copy link

mara24na commented Dec 6, 2021

Hello, thanks for the great explanation!

My data would appear in a 3x3 grid for lon and lat.
What data is included in the csv? Only the one from the "top left corner"?
Or is it interpolated?

@dbazin-eo
Copy link

dbazin-eo commented Dec 6, 2021

Hi @mara24na , thanks for your comment !

Please note that this snippet code is quite old and I would recommend to use the Copernicus Marine User Support Toolbox that has the feature to convert from NetCDF to CSV, namely .to_csv() and .to_mfcsv(), "out of the box".

You can install it via PyPI:
pip install cmemsapi

Or (most recommended) if you prefer conda :
conda create --name cmems --channel conda-forge --channel cmust python=3.8 cmemsapi --yes

You can read more and get examples in How to download a large volume (e.g. full time-series, several GigaBytes) of data in netcdf4 or csv file format?.

W/r on your question, I strongly suggest to get in touch with the User Support using the widget chat in bottom right of each Copernicus Marine webpages:

image

Wishing you good luck!

@GIS243
Copy link

GIS243 commented Nov 28, 2022

Hi @copernicusmarine

Thanks for the code is there a way to only convert the netcdf to csv for a specific region rather than the whole netcdf files please.

Thanks

@copernicusmarine
Copy link

copernicusmarine commented Nov 28, 2022

Hi @GIS243,
There are many ways to achieve what you expect!
Please contact our User Support (or via chat) with an example of marine product of your interest. They will guide you with the best scenario matching your needs (i.e. preprocessing netcdf file before downloading it).
Thanks

PS: the code presented from this page is very very old. We cannot delete it because previous gist account has been lost.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment