Skip to content

Instantly share code, notes, and snippets.

@esc
Created June 23, 2014 14:56
Show Gist options
  • Save esc/c84442572872d23c1c3d to your computer and use it in GitHub Desktop.
Save esc/c84442572872d23c1c3d to your computer and use it in GitHub Desktop.
Blaze Quickstart
Display the source blob
Display the rendered blob
Raw
{
"metadata": {
"name": ""
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": [
"Blaze Example"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This quickstart is here to show some simple ways to get started created\n",
"and manipulating Blaze arrays. To run these examples, import blaze as\n",
"follows.\n",
"\n"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import blaze\n",
"from blaze import array\n",
"from datashape import dshape"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 1
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Blaze Arrays"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To create simple Blaze arrays, you can construct them from nested lists.\n",
"Blaze will deduce the dimensionality and data type to use.\n",
"\n"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"array(3.14)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 2,
"text": [
"array(3.14,\n",
" dshape='float64')"
]
}
],
"prompt_number": 2
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"array([[1, 2], [3, 4]])"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 3,
"text": [
"array([[1, 2],\n",
" [3, 4]],\n",
" dshape='2 * 2 * int32')"
]
}
],
"prompt_number": 3
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can override the data type by providing the dshape parameter.\n",
"\n"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"array([[1, 2], [3, 4]], dshape='float64')"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 4,
"text": [
"array([[ 1., 2.],\n",
" [ 3., 4.]],\n",
" dshape='2 * 2 * float64')"
]
}
],
"prompt_number": 4
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Blaze has a slightly more general data model than NumPy, for example it\n",
"supports variable-sized arrays.\n",
"\n"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"array([[1], [2, 3, 4], [5, 6]])"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 5,
"text": [
"array([[ 1],\n",
" [ 2, 3, 4],\n",
" [ 5, 6]],\n",
" dshape='3 * var * int32')"
]
}
],
"prompt_number": 5
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Its support for strings includes variable-sized strings as well.\n",
"\n"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"array([['test', 'one', 'two', 'three'], ['a', 'braca', 'dabra']])"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 6,
"text": [
"array([[u'test', u'one', u'two', u'three'],\n",
" [u'a', u'braca', u'dabra']],\n",
" dshape='2 * var * string')"
]
}
],
"prompt_number": 6
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Simple Calculations"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Blaze supports ufuncs and arithmetic similarly to NumPy.\n",
"\n"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"a = array([1, 2, 3])\n",
"blaze.sin(a) + 1"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 7,
"text": [
"array([ 1.84147098, 1.90929743, 1.14112001],\n",
" dshape='3 * float64')"
]
}
],
"prompt_number": 7
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"blaze.sum(3 * a)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 8,
"text": [
"array(18,\n",
" dshape='int32')"
]
}
],
"prompt_number": 8
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Iterators"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Unlike in NumPy, Blaze can construct arrays directly from iterators,\n",
"automatically deducing the dimensions and type just like it does for\n",
"lists.\n",
"\n"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"from blaze import array\n",
"alst = [1, 2, 3]\n",
"array(alst.__iter__())"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 9,
"text": [
"array([1, 2, 3],\n",
" dshape='3 * int32')"
]
}
],
"prompt_number": 9
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"array([j-i for j in range(1,4)] for i in range(1,4))"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 10,
"text": [
"array([[ 0, 1, 2],\n",
" [-1, 0, 1],\n",
" [-2, -1, 0]],\n",
" dshape='3 * 3 * int32')"
]
}
],
"prompt_number": 10
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"from random import randrange\n",
"array((randrange(10) for i in range(randrange(5))) for j in range(4))"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 11,
"text": [
"array([[ 5],\n",
" [ 1, 4, 5, 1],\n",
" [ 9, 9],\n",
" [ 0, 1, 4, 8]],\n",
" dshape='4 * var * int32')"
]
}
],
"prompt_number": 11
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Disk Backed Array"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Blaze can currently use the BLZ and HDF5 format for storing compressed,\n",
"chunked arrays on disk. These can be used through the data descriptors:\n",
"\n"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"dd = blaze.BLZ_DDesc('foo.blz', mode='w')\n",
"a = blaze.array([[1,2],[3,4]], '2 * 2 * int32', ddesc=dd)\n",
"a"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 12,
"text": [
"array([[1, 2],\n",
" [3, 4]],\n",
" dshape='2 * 2 * int32')"
]
}
],
"prompt_number": 12
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"So, the dataset is now on disk, stored persistently. Then we can come\n",
"later and, in another python session, gain access to it again:\n",
"\n"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"dd = blaze.BLZ_DDesc('foo.blz', mode='r')\n",
"b = blaze.array(dd)\n",
"b"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 13,
"text": [
"array([[1, 2],\n",
" [3, 4]],\n",
" dshape='2 * 2 * int32')"
]
}
],
"prompt_number": 13
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"So, we see that we completely recovered the contents of the original\n",
"array. Finally, we can get rid of the array completely:\n",
"\n"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"dd.remove()"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 14
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This will remove the dataset from disk, so it could not be restored in\n",
"the future, so if you love your data, be careful with this one.\n",
"\n"
]
}
],
"metadata": {}
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment