Skip to content

Instantly share code, notes, and snippets.

@Eridanous
Last active November 28, 2025 14:11
Show Gist options
  • Select an option

  • Save Eridanous/5d6631b405922480fb78def99edb6930 to your computer and use it in GitHub Desktop.

Select an option

Save Eridanous/5d6631b405922480fb78def99edb6930 to your computer and use it in GitHub Desktop.
GSoC2025gist

GSoC 2025 Final Report & User Guide

πŸ“„ Final Report

1. Project Overview

  • Project Title: Extending the capabilities of OpenTRIM with a user-customizable tally
  • Organization: Open Technologies Alliance - Gfoss
  • Contributor: Iridanos L.
  • Mentors: gapost, psaxioti, elmitsi
  • Link to Code: OpenTRIM GSoC2025 branch (See note)

Goal of the Project: The goal of this project was to extend OpenTRIM with a customizable tallying mechanism (user_tally) that allows users to define their own binning schemes, select specific events, and choose coordinate systems. While tally provides fixed outputs (like depth profiles, damage, sputtering), user_tally enables researchers to:

  • Define custom bins (e.g. spatial coordinates, atom IDs).
  • Track a subset of simulation events (IonStop, IonExit, etc.).
  • Save results in the HDF5 output file in a structured and descriptive way.

The new tallying system integrates with OpenTRIM’s existing output pipeline and stores results in HDF5 files, enabling flexible data analysis and reproducibility. This makes OpenTRIM more flexible and usable for advanced research where predefined tallies are insufficient.


2. Deliverables

  • Implemented new user_tally class as a customizable alternative to the existing tally.

    • Similar to tally, but user-driven.
    • Stores data in multi-dimensional array format (user_tally::data variable).
    • Handles binning, coordinate systems, and events.
    • Exposes simple public functions for HDF5 serialization.
  • Added support for user-defined binning system, including:

    • atom_id bins
    • Coordinate bins (x, y, z, cylindrical, etc.)
  • Integrated event handling (IonStop, IonExit, etc.) into user_tally.

  • Developed new accessor functions:

    • bin_names(names, desc) β†’ returns bin names and descriptions
    • event_name(ev, name, desc) β†’ maps events to descriptive strings
    • coordinates_name(coord, name, desc) β†’ maps coordinate systems to descriptive strings
    • bin_edges(bin_id) β†’ exposes bin edges for serialization
  • User Input Specification – JSON Parsing

    • Extended the input handling by introducing a JSON-based user specification for configuring user_tally.
    • Implemented the necessary changes in parse_json.cpp to allow users to define:
      • Which events to track
      • Binning parameters (number of bins, ranges)
      • Coordinate system preferences
  • This addition ensures that tallies can be fully configured at runtime without modifying source code.
  • Extended HDF5 serialization (h5serialize.cpp):

    • Saves bin_names with descriptions.
    • Saves per-bin edges as datasets (bin_0, bin_1, …).
    • Saves event and coordinates metadata (with descriptive text).
    • Stores tally results in a "data" matrix dataset.
    • dump_array β†’ saves tally data
    • dump_vector β†’ saves z-axis, xzvec, origin, and bin edges
  • Ensured user_tally runs from JSON input (usertally_test.json) and integrates into the simulation workflow.


3. Current State

  • user_tally is functional and tested with JSON configuration files.

  • HDF5 output includes:

    • data (bin values and statistics)
    • Bin_Names (e.g., "x", "y", "atom_id") with descriptions
    • Coordinate system definitions (zaxis, xzvec, origin)
    • Bin edges per dimension
  • Atom ID binning works as expected.

  • The new features compile and run cleanly after a full rebuild, with no segmentation faults.


4. Remaining Work

  • Broader testing with complex simulations and multi-event tallies.

  • Performance benchmarking with large ion counts to ensure scalability.

  • Improved user-facing documentation:

    • How to configure user_tally via JSON.
    • Example workflows and expected HDF5 output structure.

5. Merged Code

Note on Commit History

During the development period, we encountered conflicts in the initially created GSoC2025 branch that made it difficult to maintain a clean and consistent commit record. To resolve this, we created a new dedicated branch (GSoC2025_clean) and cherry-picked all relevant contributions there. As a result, the commit timestamps do not fully reflect the chronological progress of the work throughout the GSoC period. Nevertheless, the branch now provides a coherent, conflict-free history that contains the final implementation and ensures the reproducibility of the results.


6. Challenges & Lessons Learned

Working on extending OpenTRIM with a customizable user_tally revealed a range of technical and conceptual challenges. These shaped both the design of the implementation and my growth as a developer:

  • Navigating a large legacy codebase
    OpenTRIM is a research-oriented codebase with interdependent modules (mcdriver, h5serialize, tally). Understanding where to integrate the user_tally without breaking existing functionality required careful reading of the code flow, tracing from main.cpp through the Monte Carlo driver, and experimenting with debugging breakpoints.

  • Balancing flexibility and consistency
    The user_tally needed to be user-customizable while still compatible with the existing tally infrastructure. This required creating new enums (e.g., coordinate_t) for type safety, while maintaining a consistent interface with functions like bin_names() and event_name().

  • HDF5 serialization complexities
    Saving multidimensional simulation data to HDF5 posed subtle issues. For example:

    • Choosing the right serialization function (dump_array vs. dump_vector) depending on whether error estimates were needed.
    • Ensuring consistent descriptions and metadata so results were self-documenting.
    • Debugging type mismatches (e.g., vector3 vs. std::vector<float>).
  • Hidden side effects and state management
    Functions like clear() were automatically called at stages I did not expect, leading to empty data structures during testing. This reinforced the importance of understanding object lifecycles in C++ and carefully checking constructor/destructor behavior.

  • Switching from "quick fixes" to maintainable solutions
    Early debugging relied on inserting breakpoints and hardcoding test values in init(). Transitioning from this style to robust, maintainable solutions (e.g., structured push_bins logic, reusable HDF5 functions, descriptive enums) was a key learning step.

  • Collaboration and mentorship
    Regular feedback from mentors helped me refine design choices β€” for example, moving from a simple bin_names() return to a dual-parameter version that outputs both names and descriptions, making the output more useful for postprocessing.

Key Lessons

  • Trace execution flow first β€” knowing which function is called when is crucial in large scientific codes.
  • Type safety prevents future bugs β€” enums for coordinates and events made the design more robust.
  • Write for the user of your code, not just yourself β€” HDF5 outputs should be self-describing.
  • Mentorship accelerates good design decisions β€” small suggestions (like switching event_name to return name + description) lead to large improvements.
  • Clean builds solve mysterious errors β€” I learned to frequently clean/rebuild after modifying headers or serialization code to avoid symbol lookup errors.

Overall, the project taught me not just how to extend OpenTRIM, but also how to think critically about design tradeoffs, maintainability, and reproducibility in scientific software.

7. Results & Validation

To validate the functionality of the new user_tally, we performed test runs using the JSON configuration files in test/usertally. These simulations produced HDF5 output files containing the expected binning information (e.g., atom_id, spatial coordinates, and custom variables).

The figures below illustrate typical results:

usertally_test_fig1 usertally_test_fig2

These results verify that the user_tally correctly saves bin definitions (bin_edges), names, and data matrices, making the output self-describing and ready for postprocessing.


πŸ“Š Using user_tally

The user_tally is an extension of the standard tally class that allows users to define their own events, coordinate systems, and binning systems. Results are automatically saved to the HDF5 output file.

Example JSON Configuration

{
  "UserTally": {
    "id": "TestTally"
    "event": "IonStop",
    "coordinates": "xyz",
    "atom_id": [0, 1, 2, 3],
    "x": [0.0, 10.0, 20.0, 30.0],
    "y": [0.0, 10.0, 20.0, 30.0],
    "zaxis": [0.0, 0.0, 1.0],
    "xzvec": [1.0, 0.0, 0.0],
    "org": [0.0, 0.0, 0.0]
  }
}
  • event β†’ which simulation event to tally (IonStop, IonExit, etc.).
  • coordinates β†’ which coordinate system to use (Cartesian, Cylindrical, etc.).
  • atom_id β†’ bin edges for atom identifiers.
  • x, y β†’ bin edges for spatial coordinates.
  • zaxis, xyvec, org β†’ define the coordinate system orientation.

Example structure of HDF5 output file

| - user_tally/
    | - (tally id)/
        | - bin_names         (vector of strings containing the names of the binning variables)
        | - bin_descriptions  (vector of strings containing the descriptions of the binning variables)
        | - bins/
            | - 0
            | - 1
            | - ...        
        | - coordinates       (coordinate system)
        | - data              (N-dimensional bin data array)
        | - data_sem          (N-dimensional bin data array error)
        | - event             (event type)
        | - org               (coordinate system origin)
        | - xzvec             (vector on the xz-plane)
        | - zaxis             (vector parallel to the z-axis direction)
        

Notes

  • Bin edges are defined by the user in the JSON.
  • Events and coordinates are mapped to names & descriptions automatically (via event_name and coordinates_name).

Comments are disabled for this gist.