matsen/response-to-exascale-hpc-rfi.md

## response-to-exascale-hpc-rfi.md

      
    Raw
  

              response-to-exascale-hpc-rfi.md
            
          
    Response to Science Drivers Requiring Capable Exascale High Performance Computing RFI

We can piggyback on the coding development community.

Many good things are happening in open source and industry, and we face many of
the same issues that they do. For example, GitHub has provided enormous value
to science, both through filling a need and by direct engagement. It has gotten
almost unbelievably popular in the computational life sciences. However, other
tools such as continuous integration, for example by Travis
CI, or containers, for example by
Docker, have gotten less traction despite the
contributions they could offer to the scientific community.
We need a strategy to fight bit-rot.

"Bit-rot" refers to software/pipelines that become unusable because the
underlying dependencies have changed. Sometimes the old versions
disappear completely, meaning that old pipelines cannot be reconstructed
without digging through the internet archive. Reproducibility is
fundamental to science, and thus this problem is acute.
There is a clear antidote to this problem, which are software
containers. These are lightweight virtual machines which can run on a
variety of platforms. Docker is the most well known, and the community
(including Docker) is coalescing under a standard. See
https://www.opencontainers.org/ for details. Using containers, all
dependencies are cached and the pipeline can be run reliably into the
future.
For NIH computational strategies, we need to make sure that any proposed
expansion of computing can run software containers. This isn't
an entirely trivial technical consideration.
iPlant collaborative is already doing a visionary job.

As you are no doubt aware, the iPlant Collaborative environment
(http://www.iplantcollaborative.org/) and the TACC under the leadership
of Dan Stanzione is a remarkable example of smart people given a chunk of
NSF cash resulting in an excellent product. Their Agave API
(http://agaveapi.co/) points the way to the future.
Parallel architectures require parallel algorithms.

This is self-explanatory. In my field of Bayesian phylogenetics, for example,
all algorithms in common use utilize Markov chain Monte Carlo, which is an
inherently serial algorithm. If we are to use large scale architecture we are
going to need algorithms appropriate for that architecture.
Algorithms can give > 100 fold improvement without additional infrastructure.

As we have seen recently with the development of kallisto by Bray et
al algorithms can change problems from
requiring a cluster to being quite do-able on a laptop. [Note: I understand
that running kallisto isn't the same as doing a full analysis with cufflinks,
etc, but for many common applications it appears to do a fine job.] Thus,
I hope that novel algorithm development for core computational problems will be
part of any investment in computing infrastructure.