Skip to content

Instantly share code, notes, and snippets.

@mr-c
Last active August 29, 2015 14:23
Show Gist options
  • Save mr-c/32494f544919dfc6358d to your computer and use it in GitHub Desktop.
Save mr-c/32494f544919dfc6358d to your computer and use it in GitHub Desktop.
Portable workflow and tool descriptions with the CWL

Portable workflow and tool descriptions with the CWL

Moved to the Auditorium, Room 1005

Come to the Genome and Biomedical Sciences Facility @ UC Davis, Room 1005 at 09:00 on Friday, June 26th, 2015 to hear Michael R. Crusoe give a practice talk for his presentation for the Galaxy Community Conference and the Bioinformatics Open Source Conference.

Michael is the software engineer for C. Titus Brown's lab for Data Intensive Biology which recently relocated to U. C. Davis from Michigan State University. https://impactstory.org/MichaelRCrusoe

Bioinformatics workflow platforms provide provenance tracking, execution and data management, repeatability, and an environment for data exploration and visualization. Example F/OSS bioinformatics workflow platforms include Arvados, Galaxy, Mobyle, iPlant DiscoveryEnvironment, Apache Taverna and Yabi. Each one presently represent workflows using different vocabularies and formats, and adding new tools requires different procedures for each system.

Neither the description of the workflows nor the descriptions of the tools that power them are usable outside of the platforms they were written for. This results in duplicated effort, reduced reusability, and impedes collaboration.

Three engineers (Peter Amstutz, John Chilton, and Nebojsa Tijanic) from leading bioinformatics platform teams (Curoverse, Galaxy Team, and Seven Bridges Genomics) and a tool author (Michael R. Crusoe / khmer project / then at Michigan State University) started working together at the BOSC 2014 Codefest with an initial focus on developing a portable means of representing, sharing and invoking command line tools which was then the basis for portable workflow descriptions. The group placed high value on re-using existing formats and ontologies; they governed themselves with a lazy consensus / do-ocracy approach.

On March 31st, 2015 the group released their second draft of the Common Workflow Language specification. The serialized form is a YAML document that is validated by an Apache Avro schema and can be interpreted as an RDF graph using JSON-LD. The documents are also valid Wf4Ever 'wfdesc' descriptions after a simple transformation. Future drafts will include the use of the EDAM ontology to describe the tools enabling discovery via the ELIXIR tool registry.

Seven Bridges Genomics, the Galaxy Project, and the organization behind Arvados (Curoverse) have started to implement support for the Common Workflow Language, with interest from other projects and organizations like Apache Taverna, BioDatomics and the Broad Institute. Developers on the Galaxy Team are exploring adding CWL tool description support with plans to add support for the CWL workflow descriptions. Tool authors and other community members will benefit as they will only have to describe their tool and workflow interfaces once. This will enable scientists, researchers and other analysts to share their workflows and pipelines in an interoperable and yet human readable manner.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment