Skip to content

Instantly share code, notes, and snippets.

@liyi-1989
Created July 31, 2014 20:00
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save liyi-1989/8bb558d4cbc33daa65c3 to your computer and use it in GitHub Desktop.
Save liyi-1989/8bb558d4cbc33daa65c3 to your computer and use it in GitHub Desktop.
{
"metadata": {
"name": "",
"signature": "sha256:65aa7ff8ea053a7d34e0d7496f676d92274ad6a7602765834e8e441d7a1d2f7b"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Working with HDF5 files in Python\n",
"\n",
"## 1. Introduction\n",
"\n",
"Hierarchical Data Format (HDF) is a set of file formats (HDF4, HDF5) designed to store and organize large amounts of numerical data. It is an open-source library and file format for storing large amounts of numerical data, originally developed at NCSA.\n",
"\n",
"In python you can use the **h5py** package to edit the HDF5 file. For installation issue, please consult [here](http://docs.h5py.org/en/2.3/build.html). If you are new to python, you can easily install the [Anaconda](http://continuum.io/downloads) and it will contains this package and many more commonly used packages.\n",
"\n",
"\n",
"The HDF5 file is just like a file system that stores data. It has only two kinds of objects, the **group** and the **dataset**. The group is just like the folders in a file system, while the dataset is used to store different types of data, like the NumPy array. \n",
"\n",
"The data set are saved in the HDF5 file in a way that is similar to the regular file system: `/Folder/SubFolder/DataName`.\n",
"\n",
"## 2. HDF5 in Python\n",
"\n",
"Let us assume that we have already installed h5py on your computer. We will see how to work with the h5py module. "
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import numpy as np\n",
"import h5py"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 1
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We could create a HDF5 file object by using the `h5py.File()` function. We could specify the mode as \"r\"(read) or \"w\"(write). By default, it is \"a\"(read and write)."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"myfile = h5py.File(\"ex1.hdf5\")"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 2
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2.1 Creating groups\n",
"\n",
"Now, we only create an empty HDF5 file `myfile`. We need to add some elements in it. For example, we could use the `myfile.create_group()` function to create a new group(or \"folder\"). "
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"myfile.create_group(\"grp1\")"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 3,
"text": [
"<HDF5 group \"/grp1\" (0 members)>"
]
}
],
"prompt_number": 3
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can also create a group by setting it equals to a variable."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"group2=myfile.create_group(\"grp2\")"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 4
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For a group object, you could use `keys()` function to get the object(s) name in it."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"myfile.keys()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 5,
"text": [
"[u'grp1', u'grp2']"
]
}
],
"prompt_number": 5
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Moreover, we could create a subgroup by using the same function for `group2`."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"s1=group2.create_group(\"subgroup1\")\n",
"group2.keys()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 6,
"text": [
"[u'subgroup1']"
]
}
],
"prompt_number": 6
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2.2 Creating data\n",
"\n",
"Now, it is time to make some data in the group. We could create just like a dictionary in python. "
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"s1[\"data1\"]=np.arange(0,10)\n",
"s1[\"data1\"]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 7,
"text": [
"<HDF5 dataset \"data1\": shape (10,), type \"<i4\">"
]
}
],
"prompt_number": 7
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The data created can be viewed with the `.value`. "
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"s1[\"data1\"].value"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 8,
"text": [
"array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])"
]
}
],
"prompt_number": 8
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note that the data object can be used in calculation directly. "
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"np.sum(s1[\"data1\"])"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 9,
"text": [
"45"
]
}
],
"prompt_number": 9
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"s1[\"data1\"][2]==2"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 10,
"text": [
"True"
]
}
],
"prompt_number": 10
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Also, we could use the `create_dataset()` fucntion to create a new data set. "
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"s1.create_dataset(\"data2\",(3,5),np.int)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 11,
"text": [
"<HDF5 dataset \"data2\": shape (3, 5), type \"<i4\">"
]
}
],
"prompt_number": 11
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"s1[\"data2\"].value"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 12,
"text": [
"array([[0, 0, 0, 0, 0],\n",
" [0, 0, 0, 0, 0],\n",
" [0, 0, 0, 0, 0]])"
]
}
],
"prompt_number": 12
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"s1.create_dataset(\"data3\",data=np.arange(15))\n",
"s1[\"data3\"].value"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 13,
"text": [
"array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14])"
]
}
],
"prompt_number": 13
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2.3 Deleting groups\n",
"\n",
"You could use the `del` key word to delete a group."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"s1.keys()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 14,
"text": [
"[u'data1', u'data2', u'data3']"
]
}
],
"prompt_number": 14
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"del s1[\"data3\"]\n",
"s1.keys()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 15,
"text": [
"[u'data1', u'data2']"
]
}
],
"prompt_number": 15
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. Save as CSV file\n",
"\n",
"If you want to save the data set in the HDF5 file as the csv file, you could use the **csv** package in python. For example, we create a 5 by 5 matrix under `s1`. And then, we could use the `csv.writer()` and `.writerows()` to edit the csv file. "
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import csv\n",
"\n",
"s1[\"data4\"]=np.random.rand(5,5)\n",
"\n",
"csvfile = file('csv_test.csv', 'wb')\n",
"writer = csv.writer(csvfile)\n",
"writer.writerows(s1[\"data4\"])\n",
"\n",
"csvfile.close()"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 16
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"myfile.close()"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 17
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 4. Reference\n",
"\n",
"- [**h5py.org**](http://docs.h5py.org/en/2.3/index.html)\n",
"\n",
"- [CSV package in Python](https://docs.python.org/2/library/csv.html)"
]
}
],
"metadata": {}
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment