Skip to content

Instantly share code, notes, and snippets.

@monkeybutter
Created November 3, 2015 05:03
Show Gist options
  • Save monkeybutter/4f7e1a47c623f424f994 to your computer and use it in GitHub Desktop.
Save monkeybutter/4f7e1a47c623f424f994 to your computer and use it in GitHub Desktop.
Playing with tables in hdf5
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##This notebook is hosted in raijin at: /g/data2/uc0/prl900_dev/table_test.ipynb\n",
"##In your VDI go to this path and run ipython notebook to run the code"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 1.- This example loads a table contained in a hdf5 file and loads it in a pandas dataframe"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"X 12835196\n",
"Y 12835196\n",
"Z 12835196\n",
"Intensity 12835196\n",
"Distance 12835196\n",
"Angle 12835196\n",
"dtype: int64\n"
]
},
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>X</th>\n",
" <th>Y</th>\n",
" <th>Z</th>\n",
" <th>Intensity</th>\n",
" <th>Distance</th>\n",
" <th>Angle</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1.585365</td>\n",
" <td>1.941513e-16</td>\n",
" <td>-2.264136</td>\n",
" <td>2</td>\n",
" <td>2764</td>\n",
" <td>-35.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1.582556</td>\n",
" <td>1.938071e-16</td>\n",
" <td>-2.268540</td>\n",
" <td>3</td>\n",
" <td>2766</td>\n",
" <td>-34.900002</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1.572316</td>\n",
" <td>1.925532e-16</td>\n",
" <td>-2.262266</td>\n",
" <td>3</td>\n",
" <td>2755</td>\n",
" <td>-34.799999</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1.547302</td>\n",
" <td>1.894898e-16</td>\n",
" <td>-2.234587</td>\n",
" <td>3</td>\n",
" <td>2718</td>\n",
" <td>-34.700001</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>1.553053</td>\n",
" <td>1.901941e-16</td>\n",
" <td>-2.251278</td>\n",
" <td>2</td>\n",
" <td>2735</td>\n",
" <td>-34.599998</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" X Y Z Intensity Distance Angle\n",
"0 1.585365 1.941513e-16 -2.264136 2 2764 -35.000000\n",
"1 1.582556 1.938071e-16 -2.268540 3 2766 -34.900002\n",
"2 1.572316 1.925532e-16 -2.262266 3 2755 -34.799999\n",
"3 1.547302 1.894898e-16 -2.234587 3 2718 -34.700001\n",
"4 1.553053 1.901941e-16 -2.251278 2 2735 -34.599998"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"import h5py\n",
"\n",
"h5 = h5py.File('Lidar_test.h5', 'r')\n",
"df = pd.DataFrame(h5['run'].value, columns=h5['run'].attrs['Column Names'])\n",
"\n",
"# Around 13M points, this is aprox the numbers of the biggest ragged array data\n",
"print df.count()\n",
"\n",
"#This is the 5 first rows of the dataframe\n",
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2.- Having it in pandas is very convenient because the data is queriable"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>X</th>\n",
" <th>Y</th>\n",
" <th>Z</th>\n",
" <th>Intensity</th>\n",
" <th>Distance</th>\n",
" <th>Angle</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1.585365</td>\n",
" <td>1.941513e-16</td>\n",
" <td>-2.264136</td>\n",
" <td>2</td>\n",
" <td>2764</td>\n",
" <td>-35.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>1.553053</td>\n",
" <td>1.901941e-16</td>\n",
" <td>-2.251278</td>\n",
" <td>2</td>\n",
" <td>2735</td>\n",
" <td>-34.599998</td>\n",
" </tr>\n",
" <tr>\n",
" <th>186</th>\n",
" <td>0.605622</td>\n",
" <td>7.416736e-17</td>\n",
" <td>-2.057729</td>\n",
" <td>2</td>\n",
" <td>2145</td>\n",
" <td>-16.400000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>187</th>\n",
" <td>0.579296</td>\n",
" <td>7.094331e-17</td>\n",
" <td>-1.981038</td>\n",
" <td>2</td>\n",
" <td>2064</td>\n",
" <td>-16.299999</td>\n",
" </tr>\n",
" <tr>\n",
" <th>190</th>\n",
" <td>0.566710</td>\n",
" <td>6.940201e-17</td>\n",
" <td>-1.976354</td>\n",
" <td>2</td>\n",
" <td>2056</td>\n",
" <td>-16.000000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" X Y Z Intensity Distance Angle\n",
"0 1.585365 1.941513e-16 -2.264136 2 2764 -35.000000\n",
"4 1.553053 1.901941e-16 -2.251278 2 2735 -34.599998\n",
"186 0.605622 7.416736e-17 -2.057729 2 2145 -16.400000\n",
"187 0.579296 7.094331e-17 -1.981038 2 2064 -16.299999\n",
"190 0.566710 6.940201e-17 -1.976354 2 2056 -16.000000"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Give me all the rows where intensity is equal to 2 (Quite fast considering 13M rows!)\n",
"df2 = df[df['Intensity']==2]\n",
"df2.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 3.- Pandas provides other way of storing the data called stores\n",
"I think this format is not compatible at all with NetCDF"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"store = pd.HDFStore('store.h5')\n",
"store['run'] = df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"I reckon this could be a good starting point to test different ways of storing the data of the columns. \n",
"Other approach would be to store every column in different datasets to make this Thredds friendly..."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.10"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
@cet900
Copy link

cet900 commented Nov 3, 2015

Hi Pablo, can you pls change the Lidar_test file to have world read permissions? Ta :)
(I can but better not to use back doors when I can ask you directly!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment