Skip to content

Instantly share code, notes, and snippets.

@mac-op
Last active February 8, 2025 16:32
Show Gist options
  • Save mac-op/3736a9a76349c7abc4d2a064c8de42ef to your computer and use it in GitHub Desktop.
Save mac-op/3736a9a76349c7abc4d2a064c8de42ef to your computer and use it in GitHub Desktop.
GSoC 2024 Final Report by Huy T

Supporting Aggregate Engines in UIMACPP

Overview

My work for this Google Summer of Code project is to extend Aggregate Engine support in UIMA C++. After exploring the codebase and discussing with the project mentor Dr. Pablo Duboue, I was able to identify that a core feature missing from the current implementation is Aggregate CAS Multipliers. The framework had previously only supported Engines that do not produce any new CASes [1].

What has been done

Pull Requests

  • Test cases for the missing feature:
    This is the one of the first things I worked on. After figuring out how CAS Multipliers are supposed to work with the Java codebase, I added a couple of tests, one of which had a CAS Multiplier that segments the input into multiple outputs. The other one had two CAS Multipliers, one for segmenting and the other for combining the outputs of the segmenter.

  • CAS Multiplier implementation (Pending Approval):
    After the test cases were in place, the bulk of the work was in the implementation of the various features that facilitate the use of CAS Multipliers. These features include the FlowController class [2], the Step class and Flow interface [3]. These components dictate the flow of CASes within the Aggregate Engine. A large amount of time was spent studying the Java version to understand how these components fit inside the engine and their effects on how the CASes are treated as they move around.
    After adapting these features into the C++ codebase with appropriate modifications, I implemented the algorithm to handle CAS Multipliers in the engine. With that in place, the test cases were passing and the engine now had the ability to handle Aggregate CAS Multiplier. This effort required digging in and modifying the core of the UIMA engine. It also introduced significant changes in the user-facing API, including ones for the new features mentioned above.

More information on the specific details of my implementation can be found in the corresponding PRs as well as the documentation of design decisions and issues that I compiled when working on this [4].

Other findings

Looking into the codebases I found a major difference in how they implement parameter override [5], which allows for customization of Aggregate Engines. Specifically, the C++ version has implemented an older specification that is no longer supported in Java. This mismatch in behavior would limit the compatibility of the two implementations.

What is left

Even though CAS Multipliers are now supported in Aggregate Engines, they still lack the flexibility provided by FlowControllers descriptor files [6]. The framework should extended to support them, and eventually custom FlowControllers, allowing users to customize the behavior of the engine even further.

Beyond that, the C++ version should also be updated to use the newer spec for parameter override, as described above. Its implementation would be another step to bridge the gap between the two implementations.

What I've learned

In terms of technical skills, I've picked up quite a few tricks in areas ranging from my development environment/tools to the programming language that I use. However, in hindsight, I should have been more proactive in my communications, especially with my project mentor.

Acknowledgements

I would like to thank my mentor Dr. Pablo Duboue for his guidance and the community for their support.

@DrDub
Copy link

DrDub commented Aug 22, 2024

This write-up is good and will get the job done is needed. The guidelines say:

When someone goes to the provided URL it should be clear what work you did without requiring them to do significant additional digging.

It seems the write-up is a little too Spartan to capture your work. You might want to add a short summary for each of the changes linked (so the person does not have to go to them).

Some things you are missing to discuss in the write-up:

  1. You were touching core framework code. Your project was not a cosmetic change and the magnitude of the changes is not reflected in the current write-up.
  2. You spend significant time studying the Java version of the framework. This is not mentioned either.

@DrDub
Copy link

DrDub commented Aug 22, 2024

You also discovered mismatching behaviour / legacy bugs in the existing code (w.r.t. the way overrides are handled). That's also an immense contribution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment