This is a flow to be run as a service, in order to perform request aggregation towards a target endpoint. The purpose is to reduce the stress towards the back-end system, by minimizing the arrival rate of requests and the number of needed serving threads or containers (in the case of FaaS systems). Thus it also reduces costs and improves performance. It is mostly beneficial in cases where the request needs to set up a rather heavyweight environment (thread or container) for performing a relatively small computation (e.g. in model inference, simulation etc.).
The RA gets as input the target endpoint, as well as the method to be invoked and the input data to be the payload of the aggregated call. It holds the requests until a threshold of incoming messages is reached (set in the setBatchSize endpoint), at which point it collates the inputs of the calls into one array request towards the target endpoint. Then it decomposes the outputs and responds to each individual caller.
The Body of the POST call to the RA should include: -msg.payload.targetEndpoint // the target endpoint that needs call reducing -msg.payload.method //of the target endpoint to be invoked -msg.payload.input //body of the call towards the target Endpoint, in an array form -msg.payload.creds //in the form of user:pwd for basic authentication, if needed by the target endpoint
The input data should be in an array form inside the msg.payload.input JSON object. Thus the target endpoint should be able to process arrays coming as JSON values. The output of the target service should return array of JSON objects into the msg.payload.output field. The msg.payload in the response also includes the target input of each call, for testing purposes of correct return of responses to each caller The message batch size can be set through a POST /setBatchSize method, that gets the payload number and sets the according flow variable. In future versions, link with model based setting of the batch size will be performed, in which case the setting should include other aspects such as model ID to be used as well as the location of the model inference service. The flow includes also local testing through the testRA2 endpoint, as well as a set of flows for initialization, batch size setting, and getting info on the current state of the batch (current count of messages and target batch size).
At the moment the flow supports one instance of request aggregation, meaning a single endpoint for which the calls will be aggregated. In future versions it will be considered to include the ability to support multiple concurrent endpoints.
A relevant paper that investigates the behavior of this flow can be found below: G. Kousiouris, “A self-adaptive batch request aggregation pattern for improving resource management, response time and costs in microservice and serverless environments,” in 40th IEEE International Performance Computing and Communications Conference (IPCCC 2021), IEEE, 2021.