- Project Title: Extending the capabilities of OpenTRIM with a user-customizable tally
- Organization: Open Technologies Alliance - Gfoss
- Contributor: Iridanos L.
- Mentors: gapost, psaxioti, elmitsi
- Link to Code: OpenTRIM GSoC2025 branch (See note)
Goal of the Project:
The goal of this project was to extend OpenTRIM with a customizable tallying mechanism (user_tally) that allows users to define their own binning schemes, select specific events, and choose coordinate systems. While tally provides fixed outputs (like depth profiles, damage, sputtering), user_tally enables researchers to:
- Define custom bins (e.g. spatial coordinates, atom IDs).
- Track a subset of simulation events (
IonStop,IonExit, etc.). - Save results in the HDF5 output file in a structured and descriptive way.
The new tallying system integrates with OpenTRIMβs existing output pipeline and stores results in HDF5 files, enabling flexible data analysis and reproducibility. This makes OpenTRIM more flexible and usable for advanced research where predefined tallies are insufficient.
-
Implemented new
user_tallyclass as a customizable alternative to the existingtally.- Similar to
tally, but user-driven. - Stores data in multi-dimensional array format (
user_tally::data variable). - Handles binning, coordinate systems, and events.
- Exposes simple public functions for HDF5 serialization.
- Similar to
-
Added support for user-defined binning system, including:
atom_idbins- Coordinate bins (
x,y,z, cylindrical, etc.)
-
Integrated event handling (
IonStop,IonExit, etc.) intouser_tally. -
Developed new accessor functions:
bin_names(names, desc)β returns bin names and descriptionsevent_name(ev, name, desc)β maps events to descriptive stringscoordinates_name(coord, name, desc)β maps coordinate systems to descriptive stringsbin_edges(bin_id)β exposes bin edges for serialization
-
User Input Specification β JSON Parsing
- Extended the input handling by introducing a JSON-based user specification for configuring
user_tally. - Implemented the necessary changes in
parse_json.cppto allow users to define:- Which events to track
- Binning parameters (number of bins, ranges)
- Coordinate system preferences
- Extended the input handling by introducing a JSON-based user specification for configuring
- This addition ensures that tallies can be fully configured at runtime without modifying source code.
-
Extended HDF5 serialization (
h5serialize.cpp):- Saves
bin_nameswith descriptions. - Saves per-bin edges as datasets (
bin_0,bin_1, β¦). - Saves
eventandcoordinatesmetadata (with descriptive text). - Stores tally results in a
"data"matrix dataset. dump_arrayβ saves tally datadump_vectorβ saves z-axis, xzvec, origin, and bin edges
- Saves
-
Ensured
user_tallyruns from JSON input (usertally_test.json) and integrates into the simulation workflow.
-
user_tallyis functional and tested with JSON configuration files. -
HDF5 output includes:
data(bin values and statistics)Bin_Names(e.g.,"x","y","atom_id") with descriptions- Coordinate system definitions (
zaxis,xzvec,origin) - Bin edges per dimension
-
Atom ID binning works as expected.
-
The new features compile and run cleanly after a full rebuild, with no segmentation faults.
-
Broader testing with complex simulations and multi-event tallies.
-
Performance benchmarking with large ion counts to ensure scalability.
-
Improved user-facing documentation:
- How to configure
user_tallyvia JSON. - Example workflows and expected HDF5 output structure.
- How to configure
-
The code was merged into the main branch: π OpenTRIM main branch with the last commit 5244cc1
-
Key modified files:
src/user_tally.h/src/user_tally.cppβ new tally implementationsrc/h5serialize.cppβ extended serialization logicparse_json.cppβ user Input Specification β JSON parsingtest/user_tallyβ configuration for testing
During the development period, we encountered conflicts in the initially created GSoC2025 branch that made it difficult to maintain a clean and consistent commit record. To resolve this, we created a new dedicated branch (GSoC2025_clean) and cherry-picked all relevant contributions there. As a result, the commit timestamps do not fully reflect the chronological progress of the work throughout the GSoC period. Nevertheless, the branch now provides a coherent, conflict-free history that contains the final implementation and ensures the reproducibility of the results.
Working on extending OpenTRIM with a customizable user_tally revealed a range of technical and conceptual challenges. These shaped both the design of the implementation and my growth as a developer:
-
Navigating a large legacy codebase
OpenTRIM is a research-oriented codebase with interdependent modules (mcdriver,h5serialize,tally). Understanding where to integrate theuser_tallywithout breaking existing functionality required careful reading of the code flow, tracing frommain.cppthrough the Monte Carlo driver, and experimenting with debugging breakpoints. -
Balancing flexibility and consistency
Theuser_tallyneeded to be user-customizable while still compatible with the existingtallyinfrastructure. This required creating new enums (e.g.,coordinate_t) for type safety, while maintaining a consistent interface with functions likebin_names()andevent_name(). -
HDF5 serialization complexities
Saving multidimensional simulation data to HDF5 posed subtle issues. For example:- Choosing the right serialization function (
dump_arrayvs.dump_vector) depending on whether error estimates were needed. - Ensuring consistent descriptions and metadata so results were self-documenting.
- Debugging type mismatches (e.g.,
vector3vs.std::vector<float>).
- Choosing the right serialization function (
-
Hidden side effects and state management
Functions likeclear()were automatically called at stages I did not expect, leading to empty data structures during testing. This reinforced the importance of understanding object lifecycles in C++ and carefully checking constructor/destructor behavior. -
Switching from "quick fixes" to maintainable solutions
Early debugging relied on inserting breakpoints and hardcoding test values ininit(). Transitioning from this style to robust, maintainable solutions (e.g., structuredpush_binslogic, reusable HDF5 functions, descriptive enums) was a key learning step. -
Collaboration and mentorship
Regular feedback from mentors helped me refine design choices β for example, moving from a simplebin_names()return to a dual-parameter version that outputs both names and descriptions, making the output more useful for postprocessing.
- Trace execution flow first β knowing which function is called when is crucial in large scientific codes.
- Type safety prevents future bugs β enums for coordinates and events made the design more robust.
- Write for the user of your code, not just yourself β HDF5 outputs should be self-describing.
- Mentorship accelerates good design decisions β small suggestions (like switching
event_nameto return name + description) lead to large improvements. - Clean builds solve mysterious errors β I learned to frequently clean/rebuild after modifying headers or serialization code to avoid symbol lookup errors.
Overall, the project taught me not just how to extend OpenTRIM, but also how to think critically about design tradeoffs, maintainability, and reproducibility in scientific software.
To validate the functionality of the new user_tally, we performed test runs using the JSON configuration files in test/usertally. These simulations produced HDF5 output files containing the expected binning information (e.g., atom_id, spatial coordinates, and custom variables).
The figures below illustrate typical results:
These results verify that the user_tally correctly saves bin definitions (bin_edges), names, and data matrices, making the output self-describing and ready for postprocessing.
The user_tally is an extension of the standard tally class that allows users to define their own events, coordinate systems, and binning systems. Results are automatically saved to the HDF5 output file.
{
"UserTally": {
"id": "TestTally"
"event": "IonStop",
"coordinates": "xyz",
"atom_id": [0, 1, 2, 3],
"x": [0.0, 10.0, 20.0, 30.0],
"y": [0.0, 10.0, 20.0, 30.0],
"zaxis": [0.0, 0.0, 1.0],
"xzvec": [1.0, 0.0, 0.0],
"org": [0.0, 0.0, 0.0]
}
}- event β which simulation event to tally (
IonStop,IonExit, etc.). - coordinates β which coordinate system to use (Cartesian, Cylindrical, etc.).
- atom_id β bin edges for atom identifiers.
- x, y β bin edges for spatial coordinates.
- zaxis, xyvec, org β define the coordinate system orientation.
| - user_tally/
| - (tally id)/
| - bin_names (vector of strings containing the names of the binning variables)
| - bin_descriptions (vector of strings containing the descriptions of the binning variables)
| - bins/
| - 0
| - 1
| - ...
| - coordinates (coordinate system)
| - data (N-dimensional bin data array)
| - data_sem (N-dimensional bin data array error)
| - event (event type)
| - org (coordinate system origin)
| - xzvec (vector on the xz-plane)
| - zaxis (vector parallel to the z-axis direction)
- Bin edges are defined by the user in the JSON.
- Events and coordinates are mapped to names & descriptions automatically (via
event_nameandcoordinates_name).