I, Mohd Arshul Mansoori, got an opportunity to work with CROSS under the GSOC Programme. I worked on Popper, a CLI based workflow execution engine.
My primary tasks were to
- create reusable actions in the catalouge of actions.
- create example workflows to demostrate how popper can be usefull in different domains.
- port pipelines from v1.x to workflows in v2.x.
I started to contribute by creating actions for Zenodo, which included actions to Download, Create, Upload and Publish to Zenodo. (#1, #2, #3)
After that I worked on similar actions for Figshare. (#1, #3)
I also did some minor adjustments in Spack-workflow. (#1)
Then I added functionality of defining undefined-secrets at runtime to the main popper project and added shellcheck for the scripts. This increased my understanding about the popper codebase. (#572, #573)
Then I ported pgbench pipeline
from v1.x to pgbench workflow
. (#27)
This phase gave me an idea about what GitHub actions are and how actions are combined to create workflows and how popper works.
I started with working on spark-bench
workflow, spark-bench was relatively a large workflow and included working on multiple technologies including Docker
, Terraform
, Ansible
, Apache Spark
, and Spark-Bench
all of them were new to me so I picked up gradually and with the help of my mentors I was able to wrap all the things and get this workflow running. (#26)
Alongside spark-bench, I worked parallelly on improving spack action
(#2), and did a crucial improvement in Zenodo-Download
action i.e. downloading a dataset without an API token. (#5)
I also created a CLI based parameter sweep tool sweepj2
, using Jinja2 templating engine and packed it into a pip package. (#6, #10).
After all this, we created HPC Proxy App workflow which reused the spack
action and sweepj2
to run the workflow.
This phase was filled with lots of learning and new expereinces.
In this phase, we worked on example workflows, ported pipelines to workflows and created new workflows.
First, we ported the docker-data-science pipeline to workflow. (#33)
Then added a new workflow which popperized pypet-BRIAN2 example (#38), and ported Genomics
pipeline to workflow. (#40)
Meanwhile, I added a detailed documentation about the spark-bench
workflow so the users can understand better. (#39)
After that, I started working on SeisSol workflows, which used SeisSol framework to execute earthquake simulations.
I started by wraping the compilation of SeisSol in containerzied environment and then running the first example of tpv33
. (#1)
Then I worked on creating a workflow for SC18 Student Cluster Competition Challenge in a containerless environment. (#5)
Also added MPI support to spack action (#3), changed HPC proxy app to use MPI as well (#41) and then added functionality to generate executables in sweepj2. (#13)
I also added the functionality to inject pre
or post
workflows to popper project. (#712)
That was the phase I realised how important is documentation for the end user so I tried my level best to provide documentation for everything that I did. Also learnt communication and coordination when we faced an issue in seissol dataset.
In this phase we worked on NormalModes workflows.This included working on building the NormalModes
application with OpenBlas library (#6). Then I added the support for compilation with Intel's MKL library (#12).This wrapped up the containerized
workflow.
Similarly, we created another workflow for containerless
environment so that the workflow can be executed without dependency on docker or singularity.(#7)
Then I worked on another example workflow that popperized a visualization demo using SPPARKS (#49). This needed visualization using ParaView web, therefore I have to create a dockerized application that can be used to visualize the .vtk
files generated from the workflow. So I worked on Docker-ParaViewWeb for that purpose.
Meanwhile, we had a discussion with the creator of NormalModes application js1019
, and we decided to extend the workflow futher for Validation and Visualization. So I added the validate
action (#16). For visualization
it needed to run the Matlab scripts in PlanetaryModels repository, for that we used docker-octave and then visualized the .vtk
generated using Docker-ParaViewWeb.
This was all I could do in this short period of time, I want to thank my mentors because of which I was able to learn so much and had a great working experience.
Running a workflow from any of the above repository is rather simple, you just need to have a container runtime installed (Docker and Singularity are currently supported).
pip install popper
Once installed, you can execute any workflow using popper, for instance we take NormalModes workflow
git clone --recursive https://github.com/popperized/normalmodes-workflows
cd normalmodes-workflows/workflows/containerized
popper run