Created
January 5, 2019 14:58
-
-
Save empet/8e466955c9e30f7471b4fb45c3a0fb21 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Hexbin plot" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Hexagonal Binning is a method for vizualizing bivariate distributions. It is recommended \n", | |
"for identifying patterns in large 2d data sets.\n", | |
"\n", | |
" The underlying idea is as follows: a rectangular region including a data set is tesselated with regular hexagons.\n", | |
" The number/proportion of points falling in each cell is counted and mapped to a colormap.\n", | |
"The resulting chart is called hexbin plot. \n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Matplotlib provides the function [`pyplot.hexbin`](http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.hexbin)\n", | |
"that returns an instance of `PolyCollection`. We call for such an instance a few methods in order to get data in an appropriate form for a Plotly plot." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"import matplotlib.pyplot as plt\n", | |
"%matplotlib inline\n", | |
"import numpy as np\n", | |
"import matplotlib.cm as cm\n", | |
"import cmocean# http://matplotlib.org/cmocean/\n", | |
"\n", | |
"import plotly.graph_objs as go" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Our hexagonal tesselation consists in Plotly [shapes](https://plot.ly/python/reference/#layout-shapes) bounded by regular hexagons. The corresponding color of each cell is the matplotlib facecolor of the corresponding `PolyCollection`, converted to a Plotly color by a function defined below (`pl_cell_color`).\n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Read data from a file:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"points = np.load('hexbin-data.npy')#https://github.com/empet/Datasets/blob/master/hexbin-data.npy\n", | |
"x, y = points.T" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Call the matplotlib hexbin function for our data set. Since we need only to create an instance of the `PolyCollection` class and not to show its plot, we set a very small figure size. \n", | |
"\n", | |
"In order to get initialized all attributes of this instance it is important to have `%matplotlib inline`, because some attributes are set at the plot time." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"plt.figure(figsize=(0.05,0.05))\n", | |
"plt.axis('off')\n", | |
"HB = plt.hexbin(x, y, gridsize=25, cmap=cmocean.cm.algae , mincnt=1) # cmocean.cm.algae is a cmocean colormap" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"`gridsize` is the number of hexagons in the x direction. By default it is 100.\n", | |
"\n", | |
"`mincnt` gives the minimum number of points in each hexagon. More precisely, any cell containing at least `mincnt` data points will be plotted. The default value is 0. Hence to avoid plotting hexagons with no points, we set it to 1." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"We define below the function `get_hexbin_attributes`, that returns the attributes of a hexbin type `PolyCollection` object, namely:\n", | |
" \n", | |
"- a numpy.array of shape (7, 2) that contains the coordinates of the vertices $V_0, V_1, V_2, V_3, V_4, V_5, V_0$, of a prototypical hexagon of the tesselation. It is a hexagon\n", | |
"symmetric with respect to the origin, $O(0,0)$, with two vertices on $Oy$, and scaled such that `gridsize` hexagons fill a row of the tesselation. This hexagon is then translated to the corresponding positions in the rectangular region of data, in order to get a hexagonal lattice.\n", | |
"- the `offsets` of the translation transformations, as a `numpy.array` of shape `(no_hexagons, 2)`;\n", | |
"- the matplotlib color codes (facecolors) of each hexagon;\n", | |
"- the list of hexagonal bin counts. \n", | |
"\n", | |
"The offsets, facecolors and the list of counts have the same length, equal to the number of hexagons containing at least `mincnt` points.\n", | |
"\n" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"def get_hexbin_attributes(hexbin):\n", | |
" paths = hexbin.get_paths()\n", | |
" points_codes = list(paths[0].iter_segments())#path[0].iter_segments() is a generator \n", | |
" prototypical_hexagon = [item[0] for item in points_codes]\n", | |
" return prototypical_hexagon, hexbin.get_offsets(), hexbin.get_facecolors(), hexbin.get_array()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"The following function converts matplotlib facecolors to Plotly color codes:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"def pl_cell_color(mpl_facecolors):\n", | |
" \n", | |
" return [ f'rgb({int(R*255)}, {int(G*255)}, {int(B*255)})' for (R, G, B, A) in mpl_facecolors]" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Define a function that associates to the prototypical hexagon and an offset, a closed hexagonal path, filled\n", | |
"with the corresponding Plotly facecolor. Moreover, it computes the hexagon center :" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"def make_hexagon(prototypical_hex, offset, fillcolor, linecolor=None):\n", | |
" \n", | |
" new_hex_vertices = [vertex + offset for vertex in prototypical_hex]\n", | |
" vertices = np.asarray(new_hex_vertices[:-1])\n", | |
" # hexagon center\n", | |
" center=np.mean(vertices, axis=0)\n", | |
" if linecolor is None:\n", | |
" linecolor = fillcolor\n", | |
" #define the SVG-type path: \n", | |
" path = 'M '\n", | |
" for vert in new_hex_vertices:\n", | |
" path += f'{vert[0]}, {vert[1]} L' \n", | |
" return dict(type='path',\n", | |
" line=dict(color=linecolor, \n", | |
" width=0.5),\n", | |
" path= path[:-2],\n", | |
" fillcolor=fillcolor, \n", | |
" ), center " | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Now we can transform the hexbin, HB, to a Plotly 2D hexagonal histogram:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"hexagon_vertices, offsets, mpl_facecolors, counts = get_hexbin_attributes(HB)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"The prototypical hexagon has the vertices:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"hexagon_vertices[:-1]# the last vertex coincides with the first one" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"cell_color = pl_cell_color(mpl_facecolors)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"shapes = []\n", | |
"centers = []\n", | |
"for k in range(len(offsets)):\n", | |
" shape, center = make_hexagon(hexagon_vertices, offsets[k], cell_color[k])\n", | |
" shapes.append(shape)\n", | |
" centers.append(center)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"In order to associate a colorbar to the hexbin plot, we define a dummy `Scatter` trace representing the hexagon centers.\n", | |
"The `color` attribute is the list of counts, and the colorscale is the Plotly colorscale corresponding to the matplotlib\n", | |
"colormap passed in the call of `plt.hexbin()` above." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"A matplotlib colormap is converted into a Plotly colorscale with N entries by the following function:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"def mpl_to_plotly(cmap, N):\n", | |
" h = 1.0/(N-1)\n", | |
" pl_colorscale = []\n", | |
" for k in range(N):\n", | |
" C = list(map(np.uint8, np.array(cmap(k*h)[:3])*255))\n", | |
" pl_colorscale.append([round(k*h,2), f'rgb({C[0]}, {C[1]}, {C[2]})'])\n", | |
" return pl_colorscale\n", | |
" " | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"pl_algae = mpl_to_plotly(cmocean.cm.algae, 11)\n", | |
"pl_algae" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Get data for the Plotly Scatter trace of hexagon centers:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"X, Y = zip(*centers)\n", | |
"\n", | |
"#define text to be displayed on hovering the mouse over the cells\n", | |
"text = [f'x: {round(X[k],2)}<br>y: {round(Y[k],2)}<br>counts: {int(counts[k])}' for k in range(len(X))]" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"trace = go.Scatter(\n", | |
" x=list(X), \n", | |
" y=list(Y), \n", | |
" mode='markers',\n", | |
" marker=dict(size=0.5, \n", | |
" color=counts, \n", | |
" colorscale=pl_algae, \n", | |
" showscale=True,\n", | |
" colorbar=dict(\n", | |
" thickness=20, \n", | |
" ticklen=4\n", | |
" )), \n", | |
" text=text, \n", | |
" hoverinfo='text'\n", | |
" ) " | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"axis = dict(showgrid=False,\n", | |
" showline=False,\n", | |
" zeroline=False,\n", | |
" ticklen=4 \n", | |
" )\n", | |
"\n", | |
"layout = go.Layout(title='Hexbin plot',\n", | |
" width=530, height=550,\n", | |
" xaxis=axis,\n", | |
" yaxis=axis,\n", | |
" hovermode='closest',\n", | |
" shapes=shapes,\n", | |
" plot_bgcolor='black')" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"fig = go.FigureWidget(data=[trace], layout=layout)\n", | |
"fig" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"from IPython.core.display import HTML\n", | |
"def css_styling():\n", | |
" styles = open(\"./custom.css\", \"r\").read()\n", | |
" return HTML(styles)\n", | |
"css_styling()" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Python 3", | |
"language": "python", | |
"name": "python3" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython3", | |
"version": "3.6.4" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 1 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment