(A copy of this application is available at https://bit.ly/eglen-fd2020)
Author: Stephen Eglen email@example.com
Name of organisation: University of Cambridge
Title: Analysis of spontaneous activity patterns in developing neural circuits using frictionless data tools.
Problem: We plan to explore how storing our data in frictionless data containers can help in the analysis of electrophysiological data sets. Our specific application in Neuroscience is understanding the patterns of spontaneous activity in developing neural circuits. We have recently been able to switch from animal models to using neurons dervied from human stem cells (Cotterill et al 2016). Understanding these activity patterns can help us understand the developmental progression of human diseases. One key limitation we currently face is the problem that different computational researchers use different environments for storing and analysing data. We would like to evaluate the utility of frictionless data packages in improving our workflows allowing for us to use the best computational methods regardless of the programming environment (R/Python/Matlab/Julia).
The Frictionless data project has developed considerably since I helped develop the case study examining how retinal mosaics (an anatomical dataset) could be stored and processed. In particular, packages are now available in all the computing languages used in Neuroscience.
Goals of the project
Convert our existing datasets (Eglen et al 2014; currently stored in HDF5) into frictionless data containers.
Compare the relative merits of the containers vs HDF5 for storing "medium-sized" (megabytes, rather than gigabytes) data files. Aspects to consider will include portability, efficiency and ease of access.
Develop a case study for analysing spontaneous activity patterns with a generative approach to model the underlying neuronal networks. This code has been developed by colleagues at Cambridge in Matlab, but has yet to be tested on our spontaneous activity patterns.
Write up our findings for publication in a peer-reviewed journal.
We hope that our work will give exposure of the Frictionless data containers to a community of neuroscientists who share data. Neuroscientists historically have been reluctant to share data, but this attitude is changing and so have having smooth tools to allow data sharing is important.
Work done to date
We have the data sets already for goal 1; we have the Matlab algorithm for goal 3; we need now to learn how to use your tools to achieve the aims of the project.
My research group has a longstanding interest in open science and making our tools and datasets available. I have recently successfully passed a "10 year reproducibility challenge" of a paper written in 2008 https://github.com/ReScience/ten-years/issues/1#issuecomment-621473377, and a 2015 paper of mine just this week was successfully reproduced at a ReproHack meeting. The work in this project will be conducted with the same view of long-term availability and reproducibility. One option is to put some of the core code for accessing the electrophysiological data into our R package, meaRtools (Gelfman et al. 2019) and use continuous integration tools during package development.
Betzel RF, Avena-Koenigsberger A, Goñi J, He Y, de Reus MA, Griffa A, Vértes PE, Mišic B, Thiran J-P, Hagmann P, van den Heuvel M, Zuo X-N, Bullmore ET, Sporns O (2016) Generative models of the human connectome. Neuroimage 124:1054–1064 http://dx.doi.org/10.1016/j.neuroimage.2015.09.041
Cotterill E, Charlesworth P, Thomas CW, Paulsen O, Eglen SJ (2016) A comparison of computational methods for detecting bursts in neuronal spike trains and their application to human stem cell-derived neuronal networks. J Neurophysiol 116:306–321 http://dx.doi.org/10.1152/jn.00093.2016
Eglen SJ, Weeks M, Jessop M, Simonotto J, Jackson T, Sernagor E (2014) A data repository and analysis framework for spontaneous neural activity recordings in developing retina. Gigascience 3:3 http://dx.doi.org/10.1186/2047-217X-3-3
Gelfman S, Wang Q, Lu Y-F, Hall D, Bostick CD, Dhindsa R, Halvorsen M, McSweeney KM, Cotterill E, Edinburgh T, Beaumont MA, Frankel WN, Petrovski S, Allen AS, Boland MJ, Goldstein DB, Eglen SJ (2018) meaRtools: An R package for the analysis of neuronal networks recorded on microelectrode arrays. PLoS Comput Biol 14:e1006506 http://dx.doi.org/10.1371/journal.pcbi.1006506
Your Github/Gitlab profile
Why should we choose to work with you?
I have a longstanding interest in open science and reproducible research, themes that are shared with the OKFN. I had a good experience working with your team in our pilot study on retinal mosaics several years ago and wish to continue our working relationship.
What do you think are the advantages of the Frictionless Data approach towards removing the friction in working with data?
The idea of putting data into containers is very appealing and matches what has been done for computable workflows, e.g. in Docker. A key adage in computational biology is that "80% of a researcher's time is spent shuffling data from one place to another". Frictionless data should reduce that time demand significantly, and also hopefully lead to reduced errors when data from different sources are integrated.
When are you able to start and complete the work?
I plan to hire a Cambridge undergraduate (who has already been identified and would like to work on this project) to start working on this over the Summer. We are working on the assumption that most of the work will be done remotely due to COVID-19, even though the student lives in Cambridge. The bulk of the computational work should be done by mid September, leaving time hopefully for us to write up our findings for potential publication by the end of the year.
Is there anything else you would like to let us know?
I estimate that at least 3000 USD of the budget will be used to pay the student at standard rates for undergraduate interns at the Faculty of Mathematics at Cambridge. I would like to reserve 1000 USD to cover article processing charge at a journal like F1000 Research. Any remaining funds would be used to either hire the student for slightly longer or to support compute access and matlab licence, and for long-term storage of datasets.