Skip to content

Instantly share code, notes, and snippets.

@mwvaughn
Last active September 7, 2017 16:17
Show Gist options
  • Save mwvaughn/6be714a0250617eb3edd42dd9144a38d to your computer and use it in GitHub Desktop.
Save mwvaughn/6be714a0250617eb3edd42dd9144a38d to your computer and use it in GitHub Desktop.
SD2E ETL and Processing Pipelines

Packaging Code for SD2E (30,000 ft view)

  1. Containerize the application/script codebase using Docker -or- Work with TACC to install directly on a targeted HPC system
  2. Provide a small data set and example invocation of the code so we can validate it
  3. Document at least one protocol (command sequence) for running the application to accomplish a specific set of objectives.
    • Input file(s), any reference data files (genome sequences, structured database files, etc)
    • Output file(s)
    • Parameters, including types and validation rules
    • An example invocation of the application that combines inputs, parameters, and leads to specific outputs
    • Approximate resource requirements for running the code (CPU architecture, CPU cores, RAM, free disk, requires MPI, etc)
  4. Information gathered in #3 is represented in a specialized JSON document and registered with one of TACC's application web services
  5. The application can be invoked via a call to a TACC task execution web service

For interactive applications, we'll support some alternate approaches like hosted Jupyter notebooks. Where there are 3rd party services or web applications involved, we will assist with integration.

Notes

The general approach is to take code packages that might be really complex and define repeatable protocols to do one thing well with them. For instance, if you have a package TRIATHLON v1.01 that can do run, swim, and bike tasks, we'd represent that as a set of components TRIATHLON-run-1.00 TRIATHLON-swim-1.00 and TRIATHLON-bike-1.00 each of which could be invoked separately. On the backend, the code assets are shared between the run, swim, and bike components, and we can add variations to each task or even add new ones.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment