arshul/ARSHUL_GSOC19_Final_Report.md

## ARSHUL_GSOC19_Final_Report.md

      
    Raw
  

              ARSHUL_GSOC19_Final_Report.md
            
          
    GSOC 2019 Final Report


Organization

Center for Research in Open Source Software at UC Santa Cruz

Mentors


Ivo Jimenez
Quincy Wofford

Purpose

I, Mohd Arshul Mansoori, got an opportunity to work with CROSS under the GSOC Programme. I worked on Popper, a CLI based workflow execution engine.
My primary tasks were to

create reusable actions in the catalouge of actions.
create example workflows to demostrate how popper can be usefull in different domains.
port pipelines from v1.x to workflows in v2.x.


Timeline

Before Coding Period

I started to contribute by creating actions for Zenodo, which included actions to Download, Create, Upload and Publish to Zenodo. (#1, #2, #3)
After that I worked on similar actions for Figshare. (#1, #3)
I also did some minor adjustments in Spack-workflow. (#1)
Then I added functionality of defining undefined-secrets  at runtime to the main popper project and added shellcheck for the scripts. This increased my understanding about the popper codebase. (#572, #573)
Then I ported pgbench pipeline from v1.x to pgbench workflow. (#27)
This phase gave me an idea about what GitHub actions are and how actions are combined to create workflows and how popper works.
Coding Period: Phase 1

I started with working on spark-bench workflow, spark-bench was relatively a large workflow and included working on multiple technologies including Docker, Terraform, Ansible, Apache Spark, and Spark-Bench all of them were new to me so I picked up gradually and with the help of my mentors I was able to wrap all the things and get this workflow running. (#26)
Alongside spark-bench, I worked parallelly on improving spack action (#2), and did a crucial improvement in Zenodo-Download action i.e. downloading a dataset without an API token. (#5)
I also created a CLI based parameter sweep tool sweepj2, using Jinja2 templating engine and packed it into a pip package. (#6, #10).
After all this, we created HPC Proxy App workflow which reused the spack action and sweepj2 to run the workflow.
This phase was filled with lots of learning and new expereinces.
Coding Period: Phase 2

In this phase, we worked on example workflows, ported pipelines to workflows and created new workflows.
First, we ported the docker-data-science pipeline to workflow. (#33)
Then added a new workflow which popperized pypet-BRIAN2 example (#38), and ported Genomics pipeline to workflow. (#40)
Meanwhile, I added a detailed documentation about the spark-bench workflow so the users can understand better. (#39)
After that, I started working on SeisSol workflows, which used SeisSol framework to execute earthquake simulations.
I started by wraping the compilation of SeisSol in containerzied environment and then running the first example of tpv33. (#1)
Then I worked on creating a workflow for SC18 Student Cluster Competition Challenge in a containerless environment. (#5)
Also added MPI support to spack action (#3), changed HPC proxy app to use MPI as well (#41) and then added functionality to generate executables in sweepj2. (#13)
I also added the functionality to inject pre or post workflows to popper project. (#712)
That was the phase I realised how important is documentation for the end user so I tried my level best to provide documentation for everything that I did. Also learnt communication and coordination when we faced an issue in seissol dataset.
Coding Period: Phase 3

In this phase we worked on NormalModes workflows.This included working on building the NormalModes application with OpenBlas library (#6). Then I added the support for compilation with Intel's MKL library (#12).This wrapped up the containerized workflow.
Similarly, we created another workflow for containerless environment so that the workflow can be executed without dependency on docker or singularity.(#7)
Then I worked on another example workflow that popperized a visualization demo using SPPARKS (#49). This needed visualization using ParaView web, therefore I have to create a dockerized application that can be used to visualize the .vtk files generated from the workflow. So I worked on Docker-ParaViewWeb for that purpose.
Meanwhile, we had a discussion with the creator of NormalModes application js1019, and we decided to extend the workflow futher for Validation and Visualization. So I added the validate action (#16). For visualization it needed to run the Matlab scripts in PlanetaryModels repository, for that we used docker-octave and then visualized the .vtk generated using Docker-ParaViewWeb.
This was all I could do in this short period of time, I want to thank my mentors because of which I was able to learn so much and had a great working experience.

How to run a Workflow

Running a workflow from any of the above repository is rather simple, you just need to have a container runtime installed (Docker and Singularity are currently supported).
Install Popper

pip install popper

Execution

Once installed, you can execute any workflow using popper, for instance we take NormalModes workflow
git clone --recursive https://github.com/popperized/normalmodes-workflows

cd normalmodes-workflows/workflows/containerized

popper run