This is a template subflow for parallelizing the execution of a function or process on the provided input data (Single Program Multiple Data pattern). The subflow gets the initial message and chunks it down based on the msg.payload.value.splitsize value. Each batch of input rows is forwared as input to one of three means of execution (configured in the msg.payload.value.execution as one of "faas","local_multithread","local_multiprocess"):
- an Openwhisk action (whose name is included in the msg.payload.value.action field)
- a local thread or
- a newly spawned local process (name of script to execute in msg.payload.value.shellscript)
Therefore if an array of 1000 rows (included in the msg.payload.value.values) is inserted and the splitsize is set to 10, it will create 100 inner calls of the msg.payload.value.action if the execution is set to "faas". In this case the Openwhisk action node needs also to be configured with the credentials and location of the available Openwhisk instance.
The solution of the msg.payload.value.splitsize was chosen since node properties of the Split node allow only environment variables to be used for setting the split size. These can be configured only at startup and thus can not be used for dynamic management of the pattern during runtime. Furthermore, with this dynamic setting we can also pass the splitsize as an argument when we incorporate this subflow around an HTTP endpoint and the splitsize can be an input argument.
However it needs to be noted that now the functions used in the inner parallelized action need to be able to process arrays as inputs.
The flow includes also a rate limiter, in order to be aligned with any Openwhisk options regarding maximum invocations per minute. By setting the rate limiter to the according OW limit, we can avoid failures of action invocations due to this limit. The parameter can be set in the millisecond interarrival time (msg.paylod.value.maxOWmillisecinterarrival).
For the multiprocess execution mode (local_multiprocess option of execution), each splitted message results in a separate process of that script. The script needs to be adapted to the way the multiple values are passed. A randomized local file is used for storing the values for each execution, while the format of that storage appears in the debug window.
The data format of the msg.payload is specifically designed so that it is compatible with the way Openwhisk passes arguments so that this flow can be directly combined with a relevant wrapper and executed as a function.
The configuration parameters can also be set through the node menu, however the values in the incoming message will override any set values during node initialization.
This flow has been created in the context of the H2020 PHYSICS Project (GA 101017047)