Skip to content

Instantly share code, notes, and snippets.

@bnaul
Last active July 11, 2016 22:37
Show Gist options
  • Save bnaul/891f3d29cf8cb34d3e9b4ff63f699327 to your computer and use it in GitHub Desktop.
Save bnaul/891f3d29cf8cb34d3e9b4ff63f699327 to your computer and use it in GitHub Desktop.

cesium: Open-Source Inference Platform for Time-Series Data

Brett Naul


General intro

Outline

Who we are


Scientific motivation

Light curve classification

EEG classification

Earthquake early warning

Sensor readings


Scientific approach

Main idea: project time-series data into (lower-dimensional) feature space

Steps: Featurize, build model, predict, interpret/infer, repeat

Some features are obvious/intuitive

Others are more complicated/come from domain expertise

Example: Lomb-Scargle from astronomy

Support for unevenly-spaced time series

Support for measurement errors (as inputs to feature functions)


Technology overview

Two components: Python library, web front end (uses library)

Python library

A domain scientist who is comfortable with programming but is unfamiliar with time series analysis or machine learning.

A scientist who is experienced with time series analysis but is looking for new features that can better capture patterns within their data.

A user of the cesium web front end who realizes they require additional functionality and wishes to add additional stages to their workflow.

Web front end

A scientist needs to perform time series analysis, but is unfamiliar with programming and library usage.

A group of scientists want to collaboratively explore different methods for time-series analysis.

A scientist is unfamiliar with time-series analysis, and wants to learn how to apply various methods to their data, using industry best practices.


Reproducibility

Share workflows with collaborators

Export to Jupyter notebook


Python library / API details

Contains many built-in feature extraction functions

Can use custom feature functions as well

Utilities for storing full set of feature data for time series datasets + training models / generating predictions from feature set


Notebook example?


Web example?


Technologies used

Obvious

numpy, scipy, scikit-learn, pandas

Past

Disco (-> Celery -> ?)

D3 (-> Plotly)

Drone (-> Travis)

AJAX Polling (-> WebSockets)

Present

Celery + RabbitMQ

Dask

Docker

xarray / netCDF4

Flask

PostgreSQL + Peewee

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment