Skip to content

Instantly share code, notes, and snippets.

@hhuuggoo
Last active August 29, 2015 14:07
Show Gist options
  • Save hhuuggoo/61bc52159752d0ac1944 to your computer and use it in GitHub Desktop.
Save hhuuggoo/61bc52159752d0ac1944 to your computer and use it in GitHub Desktop.
stocks1=['AAPL', 'BRCM', 'INTC', 'GOOG']
stocks2 = stocks2
time_range=[dt.datetime(2011,1,1), dt.datetime(2014,1,1)]
## interact automatically generates widgets for the input. We can also allow the user to
## specify pass in actual widgets later. In a bokeh-server application, we would actually have
## to clone the widgets each time a new user browsed to the page
@interact(stocks1=stocks1, stocks2=stocks2, time_range=time_range):
def create(stocks1, stocks2, time_range, selection=None):
data = filter(data, stocks1, stocks2, time_range, selection)
source = BlazeDataSource(data)
plot = circles('price1', 'price2', source=source)
stats = Text(text=str(data.describe())))
return HBox(children=[plot, stats])
# the update is only needed for a bokeh server application, we can't do this with the javascript
# only one because we are using pandas to do the describe. The update of the bokeh plot is all done
# inside javascript
def update(obj, stocks1, stocks2, time_range, selection=None):
data = filter(data, stocks1, stocks2, time_range, selection)
obj.children[-1].text = str(data.describe())
## the filter actually sets up a blaze expression graph
def filter(t, stocks1, stocks2, time_range, selection):
selector = t.stocks1 == stocks1
selector = selector & (t.stocks2 == stocks2)
selector = selector & (t.time <= time_range[1])
selector = selector & (t.time >= time_range[0])
if selection:
field = selection['field']
xmin = selection['data_geometry']['x0']
xmax = selection['data_geometry']['x1']
selector = selector & (t[field] >= xmin)
selector = selector & (t[field] <= xmax)
return t[selector]

Widgets

The simple mental model I would like to promote in writing bokeh widgets is evaluating functions inside of namespaces

  • widgets have names, and they have values. All the widgets in an application will define the namespace that this function will be evaluated in
  • selections on plots can be thought of as widgets, if you name a plot, then the selection gets passed into the namespace as selection_<plot_name>
  • we'll auto-number/name a plot otherwise (plot1/2/3)

In designing an application, It is helpful to have both a creation function, and an update function. In most applications, the creation function will construct all the plots, and the update function will typically manipulate plot attributes, or manipulate the data which is fed into the plot. The most trivial case, think IPython interact, the update function and the creation function are the same - you simply recreate all the plots each time. There are downsides to using this approach, besides some extra latency from constructing extra plots, You also loose UI state - for example, if I'm zoomed in to a plot when I adjust a widget, it would be nice if the plot re-drew with new data, without reconstructing the plot such that I loose my zoom position.

Selections, and non-column data sources

Non-column data sources are sources which don't have all their data in memory - so the server data source (which we use for abstracf rendering, is one of them) But the others that have been proposed are the blaze data source (which takes a blaze URI, a namespace, and an expression) evaluates it, and returns the result to the javascript client. The most obvious additional one is a naive AJAX data source which expects either a dict of arrays, or an array of dicts, and sets that as the data.

The current code base attempts to solve this problem by allowing you to pass in a server data source object in plotting.py, in which case a mirroring column data source is created and is actually passed into the plot. This is nice because it prevents us from having to alter the current code path. However it makes writing python APIs cumbersome, because each python API you write has to handle this wierd indirection. I would prefer now to have all remote data sources extend ColumnDataSource, with an empty or initial set of starting data, and then submit a request, and populate the values. This way we should have all our python APIs work seamlessly with any data source object. The only exception is that APIs which read the data in order to decide what to do (bokeh.charts) might end up with wierdly sized plots, but at least this is the start of enabling everything to use remote data sources.

Linked selections become a bit trickier - I think the naive approach of - pass the same data source in everywhere and you magically get linking, is nice, but I don't think it makes sense for these remote data sources. In addition, it is nice to be able to link plots together after they are created (I think the same argument applies to linked ranges for panning). Finally, you would like to be able to link plots together that might be using separate data sources even in the in-memory column data source case - consider a scatter plot of data, and a histogram of the same data. The data is definitely separate because one is a transformed version of the other, but it should be possible to convert range selections on the X-axis of a histogram, into simple filtering operations on a scatter plot. This is where I think the most uncertainty of this proposal lies - I think we should lift the selection manager up so that you would share it between N plots, and that would coordinate linked selections between multiple plots.

Examples

please treat this as pseudo-code, especially because my blaze is not that good. Simple stock example, date range slider, selectbox for stocks (2) scatter plot of stocks, and a text field with the output of pd.describe for the selected data

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment