Collaborative sciences source lexicons from disparate domains. Collaborative research science can not only be machine readable, but it must be human readable and accessible with minimal transaction cost. This document outlines the use of YAML (A human readable markup language), Jekyll (A templating language), and Github Pages (A generous and cheap web server provided by Github) to deploy pre-competitive research results.
A markup language processes data to be used in a presentation layer. Template languages can take simmilar/dis-simmilar information and present it to the user as a document, webpage, or flat text file.
YAML Ain't Markup --> Jekyll Templating --> Github pages
YAML files are flat files that can be deserialized into machine readable code and vice versa. (units.yml
is a pratical application of YAML for units and their metadata. Humans can read yaml
while Matlab and Python can deserialize them.)
An example YAML file: _data/Chang-2006-53-Spun.yaml
Annealing Temp, C: 100.0
Annealing Time, hr: 10.0
Author: Chang
Channel Length: 20 m
Channel Width: 10mm
DOI: 10.1103/PhysRevB.74.115318
Film Thickness: 20 - 50 nm
Hansen Radius: 13.17499999999998
Method: Spun
Mobility Environment: N2
Molecular Weight (kDa) Mn: 15.4
OFET config.: BGBC
OFET regime: saturation
PDI: 1.5
Processing Environment: N2
RT Mobility (cm^2/V/s): 0.0076
Regioregularity: high
Solution Concentration, mg/mL: 10.0
Solution Treatment: filtered 0.45m
Solution aging temp, C: 70.0
Solution aging time, min: 30.0
Solvent 1: CHCl3
Spin Rate, rpm: 1500.0
Substrate Treatment: HMDS
Year: 2006
url: http://link.aps.org/doi/10.1103/PhysRevB.74.115318
Github pages uses Jekyll to interpret templated .html
and .markdown
files using easy to learn liquid syntax.
The following screenshot uses index.html
to template the above YAML.
The template converts YAML markup into an HTML document for users to see. The template provides enormous control on the presentation format of information. Github provides and easy way to deploy YAML+templates.
A lot of information generated in the research process is superfluous; as such transporting and communicating unnecessary information causes inefficiency in research. Metadata can be used to communicate important features of research data at a reduced cost.
Metadata is difficult to define without a context. Pre-competitive research data evolves rapidly with many agents contributing information and interpretations. It is nearly impossible to impose a structure on metadata keys in an multi-agent based contribution ecosystem. Consequently, a field like materials science will be required to skip machine-readable data.
The diverse lexicon of materials science requires that machine readable content can be annotated excluding JSON. The inconsistent experience with software tools excludes a complicated language like XML. Closed databases will not support Pre-competitive research data.
Enter Human-Readable Content. YAML Ain’t Markup Language and Tom's Obvious, Minimal Language provide both ease of use in data entry and a completely extensible means to provide serialized data. I'm going to use YAML because Github Pages uses YAML.
Template language convert data into information; this can be thought of as separating the program logic of a project with its presentation layer. Django, Jinja, Jekyll, and Sphinx are all examples of templating languages that turn structured information into presentation. This discussion focuses on making web facing documentation for research data.
Github pages offers an easy to use templating language called Jekyll for user-supplied data and templates.
Jekyll is a simple, blog-aware, static site generator. It takes a template directory containing raw text files in various formats, runs it through Markdown (or Textile) and Liquid converters, and spits out a complete, ready-to-publish static website suitable for serving with your favorite web server. Jekyll also happens to be the engine behind GitHub Pages, which means you can use Jekyll to host your project’s page, blog, or website from GitHub’s servers for free. |
Datafiles are collections of YAML files stored in a folder called _data
. Jekyll serializes the data to allow the HTML/Markdown templates to use the site data.
Jekyll templates can access the site data. Then site data is then structured using an HTML template.
- Convert a data structure in your favorite programming environment to a YAML file.
- Commit the YAML file to
_data
on thegh-pages
branch. - Interpret the site data using an existing template or make your own.
Intermediate research publication provides context to metadata. Research data is generated frequently and some needs to be shared. YAML files can be (de)serialized by Matlab, Python, Ruby, etc.; most importantly Human's can read, annotate, and append them! But then this legacy of radness can be made publicly digested on Github pages.
- YAML files then can be scraped and made amenable to databases.
- Change in metadata content in versioned.
- Generally unopinionated.
- Templates can be made to access external content distribution services (e.g. Dropbox, Flickr ). One could provide share links to their public dropbox files, access thumbnails of image data from flickr, or integrate their Twitter/Instragram/Linkedin feeds to their template. (An example of Plot.ly integration)
Let your colleagues tell you. Tweak your design and presentation. Learn what metadata is important.
Thanks for this. I now understand what you're trying to do. The Plot.ly integration looks really good. The question now is how to make your idea a standard and how to get tools that produce, read and analyze to start using this.