My work for this Google Summer of Code project is to extend Aggregate Engine support in UIMA C++. After exploring the codebase and discussing with the project mentor Dr. Pablo Duboue, I was able to identify that a core feature missing from the current implementation is Aggregate CAS Multipliers. The framework had previously only supported Engines that do not produce any new CASes [1].
-
Test cases for the missing feature:
This is the one of the first things I worked on. After figuring out how CAS Multipliers are supposed to work with the Java codebase, I added a couple of tests, one of which had a CAS Multiplier that segments the input into multiple outputs. The other one had two CAS Multipliers, one for segmenting and the other for combining the outputs of the segmenter. -
CAS Multiplier implementation (Pending Approval):
After the test cases were in place, the bulk of the work was in the implementation of the various features that facilitate the use of CAS Multipliers. These features include the FlowController class [2], the Step class and Flow interface [3]. These components dictate the flow of CASes within the Aggregate Engine. A large amount of time was spent studying the Java version to understand how these components fit inside the engine and their effects on how the CASes are treated as they move around.
After adapting these features into the C++ codebase with appropriate modifications, I implemented the algorithm to handle CAS Multipliers in the engine. With that in place, the test cases were passing and the engine now had the ability to handle Aggregate CAS Multiplier. This effort required digging in and modifying the core of the UIMA engine. It also introduced significant changes in the user-facing API, including ones for the new features mentioned above.
More information on the specific details of my implementation can be found in the corresponding PRs as well as the documentation of design decisions and issues that I compiled when working on this [4].
Looking into the codebases I found a major difference in how they implement parameter override [5], which allows for customization of Aggregate Engines. Specifically, the C++ version has implemented an older specification that is no longer supported in Java. This mismatch in behavior would limit the compatibility of the two implementations.
Even though CAS Multipliers are now supported in Aggregate Engines, they still lack the flexibility provided by FlowControllers descriptor files [6]. The framework should extended to support them, and eventually custom FlowControllers, allowing users to customize the behavior of the engine even further.
Beyond that, the C++ version should also be updated to use the newer spec for parameter override, as described above. Its implementation would be another step to bridge the gap between the two implementations.
In terms of technical skills, I've picked up quite a few tricks in areas ranging from my development environment/tools to the programming language that I use. However, in hindsight, I should have been more proactive in my communications, especially with my project mentor.
I would like to thank my mentor Dr. Pablo Duboue for his guidance and the community for their support.
This write-up is good and will get the job done is needed. The guidelines say:
It seems the write-up is a little too Spartan to capture your work. You might want to add a short summary for each of the changes linked (so the person does not have to go to them).
Some things you are missing to discuss in the write-up: