I am proposing for a new module that provides a foreach macro. foreach
terminology is frequently used in hadoop/sparx/mapreduce pipeline world, so the concept may translate quickly to those who are already have experience with that.
The goal for 'foreach' macro is to allow users to generate tasks dynamically with the list of parameters provided.
the foreach takes in kwargs in a form of foreach(param=[values..], param2=[values...], ..)
, which provides the set of parameters to generate splits from. Then, for each combination of parameters in the splits, we generate models using closures.
we use a closure function to pass in the parameters that provides the splits, then define the actual model (such as task) in nested functions.
An example dag foreach_dag.py
were provided to showcase an example of the foreach
decorator. The actual usage is at line 31.
The benefit of this syntaxs allows users to consistently specify how tasks/models are generated while preserving the way tasks were specified in argo dsl.
In summary, the foreach macro and closure should only relay the parameter splits populate multiple argo models generated by the nested function.
In addition to that, I also think of a design to allow users join tasks by specifying dynamic dependencies against argo models generated by foreach macros.
The syntax looks like @foreach.dependencies(name=closure_function_name)
, see line 58 of foreach_dag.py
.
I am still in the process of understanding the meta programming models implemented in Argo dsl.
A work in progress, proof of concept implementation is in macro.py
. I am struggling to figure out how to relay the "models" generated from nested @task
decorated functions to parant Workflow spec. Maybe by setting __props__
to the closure function can get picked up by _compile
from Workflow class? Then I am also not sure how that can be done at the moment.
Please let me know if this is a feasible design (whether I am on the right path). I'd like to get your feedback and help before I keep going further. Thank you!!