Skip to content

Instantly share code, notes, and snippets.

@GenevieveBuckley
Last active May 11, 2021 04:02
Show Gist options
  • Save GenevieveBuckley/bbc6f3e55e996f6d04865ec93477e8a9 to your computer and use it in GitHub Desktop.
Save GenevieveBuckley/bbc6f3e55e996f6d04865ec93477e8a9 to your computer and use it in GitHub Desktop.
Talk proposal SciPy 2021

TITLE:

Scaling Science: leveraging Dask for life sciences

SHORT ABSTRACT:

Managing the challenges associated with big data in life sciences can be difficult. Scalable scientific computing is required to cope with the increasing demands of modern biology and neuroscience. Dask is a python library for distributed computation. In this talk, we'll look at several case studies where Dask is used to scale up data processing for life sciences. It will include examples from statistical genetics, single cell analysis, and imaging visualization & analysis. This will give you a better understanding of how you can extend code with Dask to scale your analysis.

DESCRIPTION:

Advances in modern biology research bring with them an increasing demand on computational resources. We need ways to scale scientific computing, to meet the demands of big data in biology and neuroscience. This talk provides an overview of how Dask can be used as a tool for more effective computing, and how this can be integrated with other tools in the scientific python ecosystem.

We will walk through several case studies, taken from a diverse range of biology and neuroscience applications. This includes examples from:

  • statistical genetics
  • single cell analysis
  • image analysis

Dask is an open source project for distributed computing in python. In addition to the main Dask library, there are a number of other specialized Dask repositories of interest to biologists and neuroscientists, including but not limited to: dask-distributed, dask-ml, dask-image. The Dask organization can be found on github at https://github.com/dask and documentation is available at https://dask.org/

In addition, we touch on a number of other packages in the scientific python ecosystem:

  • xarray, a package for labelled multi-dimensional arrays
  • napari, a python based viewer for out-of-core visualization
  • sgkit, a statistical genetics toolkit
  • scanpy, a single cell analysis toolkit

After this talk you'll be aware of the range of potential approaches for scaling up analysis in the life sciences, and be more equipped to implement some of these approaches in your own work.

Additional Material Presenter speaking samples:

KEYWORDS:

  • big data
  • distributed computing
  • life science
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment