Skip to content

Instantly share code, notes, and snippets.

@tschaume
Last active August 29, 2015 14:16
Show Gist options
  • Save tschaume/7b8d83d119dda03ddc60 to your computer and use it in GitHub Desktop.
Save tschaume/7b8d83d119dda03ddc60 to your computer and use it in GitHub Desktop.
MP User Contribution Framework (MPContribs)
Title

'Development of the Materials Project's open-source framework enabling seamless integration of generic user-contributed data for Computational Materials Design'

Authors

Patrick Huck, Anubhav Jain, Dan Gunter, Kristin Persson (LBNL)

Abstract

The 'Materials Project' utilizes HPC resources to determine the energetic and structural information of over 50,000 inorganic compounds by means of high-throughput ab-initio calculations. The continually growing supply of new and more advanced experimental and theoretical materials produced by its user community makes it increasingly important and valuable for scientific platforms like the Materials Project to enable community-driven submissions. In this paper, we describe computing and software infrastructure to integrate and organize contributions of computed or measured materials data from users. The presented community solution supports a wide range of user data complexities by extending the ubiquitous csv file format with arbitrarily nested trees of key-value pairs and decomposing contributions into suitable "atomic" records that are stored in our MongoDB database. Interactive data analysis interfaces allow contributors to share analyses and graphs of arbitrary tabular data, such as measured XAS/XMCD spectra to be compared to FEFF calculations or computed diffusivities for different temperatures and solutes. A RESTful API provides mechanisms for book-keeping, retrieval and (re-)aggregation of submitted entries, as well as persistent URIs and DOIs that can be used to reference the data in publications. Our approach isolates the contributed data from the project’s quality-controlled core data while still allowing analyses of metrics across the entire dataset programmatically or through user-specific web apps. The resulting framework is expected to enhance user collaborations on materials properties and maximize the impact of each contributor’s dataset on the community. The immediate outcome is more efficient research activities due to the centralized exchange of data, techniques and best practices. In the long-term view, this framework is a significant step towards making the Materials Project an institutional, and thus community-wide, memory for computed and experimental materials science.

Paper (Protected)

see https://github.com/tschaume/mp-docs/blob/master/eScience15/eScience15.pdf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment