Facing compute platform portability challenges with scientific workflows - experiences from Common Workflow Language
Stian Soiland-Reyes, The University of Manchester; Apache Software Foundation
- Talk abstract submitted to PASC 2018 minisymposium, Advances in Automation and Efficiency for the Exascale Era - Experiences from the Biomolecular Sciences.
Scientific Workflow systems are well established for computational analysis in all science domains, following the rapid development of workflow technology and community practices spanning the two recent decades, the eScience era. Workflow systems have gained traction in the era of Big Data Science due to their “ASAP" properties”: Automation over repetitive pipelines and simulation sweep campaigns; Scaling over computational infrastructure & handle large data; Abstraction to shield users and programs from complexity and incompatibilities and Provenance to auto-document execution logs and data lineage for future analysis.
A major hindrance for wider adaptation and reuse of workflows, even when open source, is that they are written for specific workflow systems or infrastructures. Common Workflow Language (CWL) has emerged as a community initiative with support across a range of existing workflow engines, using a language specification that focus on the common denominator of command line tools exchanging files. Support for CWL on HPC expanded in the recent months, such as IBM's CWLEXEC on LSF, or Toil with Singularity.
In this talk we will present the challenges of moving CWL workflows towards Exascale, while retaining key features of workflows such as reproducibility, interoperability, usability and provenance.