Last active
August 29, 2015 14:02
-
-
Save alexland/d011eeeb9724ff072a02 to your computer and use it in GitHub Desktop.
shows how to persist a NumPy array using HDF5 and how to use that (HDF5) on-disk array for out-of-core matrix computation; this is the JSON (w/ the code embedded) for an ipython notebook; to view it, go to http://nbviewer.ipython.orgin; in the textbox, type in the id for this gist
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"metadata": { | |
"name": "snippets, notes" | |
}, | |
"nbformat": 3, | |
"nbformat_minor": 0, | |
"worksheets": [ | |
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": "to publish an ipython notebook:\n\n* from an ipython notebook, click _File_ from the menu bar then _Download as_ _ipynb_\n\n* upload this .ipynb file as a github gist:\n\n * install this ruby gem\n * at bash prompt type: $> gist name_of_ipython_notebook.ipynb\n * a URL will be returned, eg, https://gist.github.com/7998475\n\n* record the _gist_id_ generated\n\n* the notebook is now published at URL: http://nbviewer.ipython.org/gist_id/" | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": "* to view the published notebook, go to: http://nbviewer.ipython.org\n* in the textbox, type in the gist_id and _enter_" | |
}, | |
{ | |
"cell_type": "heading", | |
"level": 2, | |
"metadata": {}, | |
"source": "to persist NumPy arrays on disk using HDF5 (& python bindings, h5py):" | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "import numpy as NP\nimport h5py as HDF5", | |
"language": "python", | |
"metadata": {}, | |
"outputs": [], | |
"prompt_number": 2 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": "create an h5py file handle, then call its _create_dataset_ method & pass in an empty array" | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "fh = HDF5.File(\"test.hdf5\", \"w\")", | |
"language": "python", | |
"metadata": {}, | |
"outputs": [], | |
"prompt_number": 56 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": "mock some data--ie, create a NumPy array having the same size as the h5py dataset, _ds_" | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "M_in = NP.random.randint(0, 10, 100000*100).reshape(100000, 100)\nM_in = M_in.astype(NP.int32)", | |
"language": "python", | |
"metadata": {}, | |
"outputs": [], | |
"prompt_number": 57 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": "now insert this array into the h5py dataset" | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "fh[\"test\"] = M_in", | |
"language": "python", | |
"metadata": {}, | |
"outputs": [], | |
"prompt_number": 58 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "ds = fh[\"test\"]", | |
"language": "python", | |
"metadata": {}, | |
"outputs": [], | |
"prompt_number": 59 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "ds.name", | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 60, | |
"text": "'/test'" | |
} | |
], | |
"prompt_number": 60 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "ds", | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 61, | |
"text": "<HDF5 dataset \"test\": shape (100000, 100), type \"<i4\">" | |
} | |
], | |
"prompt_number": 61 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "dset = fh['test1']", | |
"language": "python", | |
"metadata": {}, | |
"outputs": [], | |
"prompt_number": 36 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "dset.name", | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 37, | |
"text": "'/test1'" | |
} | |
], | |
"prompt_number": 37 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "dset", | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 39, | |
"text": "<HDF5 dataset \"test1\": shape (1000, 100), type \"<i4\">" | |
} | |
], | |
"prompt_number": 39 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "fh.close()", | |
"language": "python", | |
"metadata": {}, | |
"outputs": [], | |
"prompt_number": 40 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": "now open the original file and read the data out" | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "with HDF5.File(\"test.hdf5\", \"r\") as fh:\n M_out = fh[\"test\"].value", | |
"language": "python", | |
"metadata": {}, | |
"outputs": [], | |
"prompt_number": 62 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "M_out.shape", | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 42, | |
"text": "(1000, 100)" | |
} | |
], | |
"prompt_number": 42 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "M_in.shape", | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 43, | |
"text": "(1000, 100)" | |
} | |
], | |
"prompt_number": 43 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "M_in[:5,:5]", | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 63, | |
"text": "array([[7, 8, 9, 4, 8],\n [6, 0, 1, 0, 7],\n [7, 0, 4, 5, 6],\n [0, 4, 6, 9, 2],\n [8, 2, 0, 6, 5]], dtype=int32)" | |
} | |
], | |
"prompt_number": 63 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "M_out[:5,:5]", | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 64, | |
"text": "array([[7, 8, 9, 4, 8],\n [6, 0, 1, 0, 7],\n [7, 0, 4, 5, 6],\n [0, 4, 6, 9, 2],\n [8, 2, 0, 6, 5]], dtype=int32)" | |
} | |
], | |
"prompt_number": 64 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "assert (M_in == M_out).all()", | |
"language": "python", | |
"metadata": {}, | |
"outputs": [], | |
"prompt_number": 66 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "", | |
"language": "python", | |
"metadata": {}, | |
"outputs": [] | |
} | |
], | |
"metadata": {} | |
} | |
] | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment