Skip to content

Instantly share code, notes, and snippets.

@alexland
Last active August 29, 2015 14:02
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save alexland/d011eeeb9724ff072a02 to your computer and use it in GitHub Desktop.
Save alexland/d011eeeb9724ff072a02 to your computer and use it in GitHub Desktop.
shows how to persist a NumPy array using HDF5 and how to use that (HDF5) on-disk array for out-of-core matrix computation; this is the JSON (w/ the code embedded) for an ipython notebook; to view it, go to http://nbviewer.ipython.orgin; in the textbox, type in the id for this gist
Display the source blob
Display the rendered blob
Raw
{
"metadata": {
"name": "snippets, notes"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": "to publish an ipython notebook:\n\n* from an ipython notebook, click _File_ from the menu bar then _Download as_ _ipynb_\n\n* upload this .ipynb file as a github gist:\n\n * install this ruby gem\n * at bash prompt type: $> gist name_of_ipython_notebook.ipynb\n * a URL will be returned, eg, https://gist.github.com/7998475\n\n* record the _gist_id_ generated\n\n* the notebook is now published at URL: http://nbviewer.ipython.org/gist_id/"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "* to view the published notebook, go to: http://nbviewer.ipython.org\n* in the textbox, type in the gist_id and _enter_"
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": "to persist NumPy arrays on disk using HDF5 (& python bindings, h5py):"
},
{
"cell_type": "code",
"collapsed": false,
"input": "import numpy as NP\nimport h5py as HDF5",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 2
},
{
"cell_type": "markdown",
"metadata": {},
"source": "create an h5py file handle, then call its _create_dataset_ method & pass in an empty array"
},
{
"cell_type": "code",
"collapsed": false,
"input": "fh = HDF5.File(\"test.hdf5\", \"w\")",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 56
},
{
"cell_type": "markdown",
"metadata": {},
"source": "mock some data--ie, create a NumPy array having the same size as the h5py dataset, _ds_"
},
{
"cell_type": "code",
"collapsed": false,
"input": "M_in = NP.random.randint(0, 10, 100000*100).reshape(100000, 100)\nM_in = M_in.astype(NP.int32)",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 57
},
{
"cell_type": "markdown",
"metadata": {},
"source": "now insert this array into the h5py dataset"
},
{
"cell_type": "code",
"collapsed": false,
"input": "fh[\"test\"] = M_in",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 58
},
{
"cell_type": "code",
"collapsed": false,
"input": "ds = fh[\"test\"]",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 59
},
{
"cell_type": "code",
"collapsed": false,
"input": "ds.name",
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 60,
"text": "'/test'"
}
],
"prompt_number": 60
},
{
"cell_type": "code",
"collapsed": false,
"input": "ds",
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 61,
"text": "<HDF5 dataset \"test\": shape (100000, 100), type \"<i4\">"
}
],
"prompt_number": 61
},
{
"cell_type": "code",
"collapsed": false,
"input": "dset = fh['test1']",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 36
},
{
"cell_type": "code",
"collapsed": false,
"input": "dset.name",
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 37,
"text": "'/test1'"
}
],
"prompt_number": 37
},
{
"cell_type": "code",
"collapsed": false,
"input": "dset",
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 39,
"text": "<HDF5 dataset \"test1\": shape (1000, 100), type \"<i4\">"
}
],
"prompt_number": 39
},
{
"cell_type": "code",
"collapsed": false,
"input": "fh.close()",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 40
},
{
"cell_type": "markdown",
"metadata": {},
"source": "now open the original file and read the data out"
},
{
"cell_type": "code",
"collapsed": false,
"input": "with HDF5.File(\"test.hdf5\", \"r\") as fh:\n M_out = fh[\"test\"].value",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 62
},
{
"cell_type": "code",
"collapsed": false,
"input": "M_out.shape",
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 42,
"text": "(1000, 100)"
}
],
"prompt_number": 42
},
{
"cell_type": "code",
"collapsed": false,
"input": "M_in.shape",
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 43,
"text": "(1000, 100)"
}
],
"prompt_number": 43
},
{
"cell_type": "code",
"collapsed": false,
"input": "M_in[:5,:5]",
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 63,
"text": "array([[7, 8, 9, 4, 8],\n [6, 0, 1, 0, 7],\n [7, 0, 4, 5, 6],\n [0, 4, 6, 9, 2],\n [8, 2, 0, 6, 5]], dtype=int32)"
}
],
"prompt_number": 63
},
{
"cell_type": "code",
"collapsed": false,
"input": "M_out[:5,:5]",
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 64,
"text": "array([[7, 8, 9, 4, 8],\n [6, 0, 1, 0, 7],\n [7, 0, 4, 5, 6],\n [0, 4, 6, 9, 2],\n [8, 2, 0, 6, 5]], dtype=int32)"
}
],
"prompt_number": 64
},
{
"cell_type": "code",
"collapsed": false,
"input": "assert (M_in == M_out).all()",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 66
},
{
"cell_type": "code",
"collapsed": false,
"input": "",
"language": "python",
"metadata": {},
"outputs": []
}
],
"metadata": {}
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment