Skip to content

Instantly share code, notes, and snippets.

@tonyfast
Last active August 29, 2015 14:03
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save tonyfast/7616aee4c0c2afe647bf to your computer and use it in GitHub Desktop.
Save tonyfast/7616aee4c0c2afe647bf to your computer and use it in GitHub Desktop.
A Github Stack for Templating Pre-competitive Research Metadata

Abstract

Collaborative sciences source lexicons from disparate domains. Collaborative research science can not only be machine readable, but it must be human readable and accessible with minimal transaction cost. This document outlines the use of YAML (A human readable markup language), Jekyll (A templating language), and Github Pages (A generous and cheap web server provided by Github) to deploy pre-competitive research results.

Templating + Markup Language

A markup language processes data to be used in a presentation layer. Template languages can take simmilar/dis-simmilar information and present it to the user as a document, webpage, or flat text file.

Example

YAML Ain't Markup --> Jekyll Templating --> Github pages

YAML

YAML files are flat files that can be deserialized into machine readable code and vice versa. (units.yml is a pratical application of YAML for units and their metadata. Humans can read yaml while Matlab and Python can deserialize them.)

An example YAML file: _data/Chang-2006-53-Spun.yaml

Annealing Temp, C: 100.0
Annealing Time, hr: 10.0
Author: Chang
Channel Length: 20 m
Channel Width: 10mm
DOI: 10.1103/PhysRevB.74.115318
Film Thickness: 20 - 50 nm
Hansen Radius: 13.17499999999998
Method: Spun
Mobility Environment: N2
Molecular Weight (kDa) Mn: 15.4
OFET config.: BGBC
OFET regime: saturation
PDI: 1.5
Processing Environment: N2
RT Mobility (cm^2/V/s): 0.0076
Regioregularity: high
Solution Concentration, mg/mL: 10.0
Solution Treatment: filtered 0.45m
Solution aging temp, C: 70.0
Solution aging time, min: 30.0
Solvent 1: CHCl3
Spin Rate, rpm: 1500.0
Substrate Treatment: HMDS
Year: 2006
url: http://link.aps.org/doi/10.1103/PhysRevB.74.115318

Templated YAML

Github pages uses Jekyll to interpret templated .html and .markdown files using easy to learn liquid syntax.

The following screenshot uses index.html to template the above YAML.

http://tonyfast.com/Organic-Field-Effect-Transistor/

The template converts YAML markup into an HTML document for users to see. The template provides enormous control on the presentation format of information. Github provides and easy way to deploy YAML+templates.

Metadata

A lot of information generated in the research process is superfluous; as such transporting and communicating unnecessary information causes inefficiency in research. Metadata can be used to communicate important features of research data at a reduced cost.

important

Metadata is difficult to define without a context. Pre-competitive research data evolves rapidly with many agents contributing information and interpretations. It is nearly impossible to impose a structure on metadata keys in an multi-agent based contribution ecosystem. Consequently, a field like materials science will be required to skip machine-readable data.

Human-Readable Content

The diverse lexicon of materials science requires that machine readable content can be annotated excluding JSON. The inconsistent experience with software tools excludes a complicated language like XML. Closed databases will not support Pre-competitive research data.

Enter Human-Readable Content. YAML Ain’t Markup Language and Tom's Obvious, Minimal Language provide both ease of use in data entry and a completely extensible means to provide serialized data. I'm going to use YAML because Github Pages uses YAML.

Templating Language

Template language convert data into information; this can be thought of as separating the program logic of a project with its presentation layer. Django, Jinja, Jekyll, and Sphinx are all examples of templating languages that turn structured information into presentation. This discussion focuses on making web facing documentation for research data.

Github pages offers an easy to use templating language called Jekyll for user-supplied data and templates.

Jekyll Interpreter

Jekyll is a simple, blog-aware, static site generator. It takes a template directory containing raw text files in various formats, runs it through Markdown (or Textile) and Liquid converters, and spits out a complete, ready-to-publish static website suitable for serving with your favorite web server. Jekyll also happens to be the engine behind GitHub Pages, which means you can use Jekyll to host your project’s page, blog, or website from GitHub’s servers for free.

Jekyll Site Data

Datafiles are collections of YAML files stored in a folder called _data. Jekyll serializes the data to allow the HTML/Markdown templates to use the site data.

Jekyll Templates

Jekyll templates can access the site data. Then site data is then structured using an HTML template.

What is a Use Case?

  1. Convert a data structure in your favorite programming environment to a YAML file.
  2. Commit the YAML file to _data on the gh-pages branch.
  3. Interpret the site data using an existing template or make your own.

Why is this Powerful

Intermediate research publication provides context to metadata. Research data is generated frequently and some needs to be shared. YAML files can be (de)serialized by Matlab, Python, Ruby, etc.; most importantly Human's can read, annotate, and append them! But then this legacy of radness can be made publicly digested on Github pages.

Broader Impact

  • YAML files then can be scraped and made amenable to databases.
  • Change in metadata content in versioned.
  • Generally unopinionated.
  • Templates can be made to access external content distribution services (e.g. Dropbox, Flickr ). One could provide share links to their public dropbox files, access thumbnails of image data from flickr, or integrate their Twitter/Instragram/Linkedin feeds to their template. (An example of Plot.ly integration)

So What Metadata is Correct

Let your colleagues tell you. Tweak your design and presentation. Learn what metadata is important.

@wd15
Copy link

wd15 commented Jul 22, 2014

Thanks for this. I now understand what you're trying to do. The Plot.ly integration looks really good. The question now is how to make your idea a standard and how to get tools that produce, read and analyze to start using this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment