Skip to content

Instantly share code, notes, and snippets.

@a-seskunas
Last active December 16, 2025 10:32
Show Gist options
  • Select an option

  • Save a-seskunas/2fd901534962b6ac728c6aee68a8b5c7 to your computer and use it in GitHub Desktop.

Select an option

Save a-seskunas/2fd901534962b6ac728c6aee68a8b5c7 to your computer and use it in GitHub Desktop.
ReportBuiler Final Report


Final Report - GSoC - Implement ReportBuilder in C++

Author - Adam Seskunas
Project - LibreOffice - The Document Foundation
Mentors - Hossein Nourikhah, Michael Weghorn


Project Proposal Page - Implement ReportBuilder in C++

The Report Builder is located in LibreOffice Base, its function is to produce a report in the form of a LibreOffice Writer document from a LibreOffice Base document. As currently implemented, the Report Builder uses a combination of Java and the Pentaho Report Designer library which makes maintaining its code more difficult and creates issues for packaging LibreOffice for different distributions. This project aims to solve these issues by implementing Report Builder in C++ and removing the Java and Pentaho dependencies from the LibreOffice code base.


Summary of Work Done

  1. Investigate and discover how ReportBuilder works.

The first step in implementing ReportBuilder in C++ was to learn how and where Pentaho was being used in the current code base. This phase of the project took quite some time but was crucial to being able to perform the implementation in the next steps. Documentation was produced so that it could be referenced later in the project and to also to aid in discussions with my mentors in how the project should proceed. The investigation produced the following results

  • Pentaho uses the output from xmlExport.cxx as input.
  • Pentaho processes this input adding in the database information and adds in other information such as formula results as well as formatting such as page breaks.
  • The xml produced by xmlExport.cxx is used to save and load a Report. This is important because it means that in order to preserve backwards compatibility, xmlExport must not change its output.
  • Not all of the features of ReportBuilder are performed by Pentaho, in fact, enough features are implemented in xmlExport.cxx that it may be possible to extend xmlExport.cxx to perform the features that Pentaho is currently performing.

The last point was crucial because the project proposal had been written with the idea of using an xslt filter to parse the output of xmlExport.cxx to produce the output of the Pentaho Report Engine. After talking over what I had learned with my mentors we decided that the best way forward was to Subclass xmlExport to create a second filter that would then replace the Pentaho Report Engine.

The information learned in this phase of the project can be found through the two links below.
ReportBuilder Internal Model
Output Paths For ReportBuilder

  1. Subclass xmlExport

More investigation was done in this step as well, to identify where and how the xml filter was being called and how xmlExport actually functioned. A lot of work was done to determine which member functions in xmlExport could be reused by xmlExport and which member functions would have to be overridden in order to replace the Pentaho functionality.

For example, all of the positioning of the elements in a Report are performed in xmlExport and so this code doesn’t need to be modified and can be reused in xmlExport2. By carefully examining and comparing the output of xmlExport and the xml produced by the Pentaho Report Engine a sort of rough draft xmlExport2 class was produced.

At this point, in order to test if the idea was going to work, the original filter that referenced xmlExport was just replaced in the code with the new filter that referenced xmlExport2. The code for this can be seen in the commit below. Note, this commit has since been squashed into a larger commit that covers all of the changes to xmlExport2.

https://gerrit.libreoffice.org/c/core/+/188898/1

  1. Allow Reports to be Saved and Executed with C++

As mentioned above, the new filter was implemented in the code as a replacement for the original filter. This would break a saved Report as it would be stored using different xml that the input filter would not be expecting.

After testing the new xmlExport2 filter and determining that it seemed feasible to replace the output of the Pentaho Report Engine with it, the next step was to implement a Save path for a Report that would use the original filter, and an Execute path that would use the new filter.

This step also involved lots of detective work to determine all of the steps involved with saving, loading and exporting a Report Document. The implementation for this can be found in the commit linked below. Having this implemented sped up the development process because now, a Report could be saved and then re-opened with the new filter to check its output allowing for manual testing of the output of xmlExport2.

https://gerrit.libreoffice.org/c/core/+/188899

  1. Extend xmlExport2 Functionality

At this point in the project it was time to start implementing the functionality of Pentaho in C++. The first step in this process was to retrieve the information from the database, as the basic functionality of a Report is to display database information.

Next Labels and Text Boxes were addressed. Text Boxes can also contain Formula’s and these were addressed here. The implementation of Formula’s was tricky. It turns out that the Formulas used in a Report are OpenFormula, as stated in the Base Guide pdf.

Writing a parser and interpreter for these Formulas was a non-starter, it would take quite a bit of work and lots of testing which would be out of scope for this project. After sleuthing around through the LibreOffice codebase, a solution was found in that there is support for parsing and executing OpenFormula in Calc and so this code was re-used here.

In order to help with the Formula implementation, the ReportFormula class was extended to include some member functions, which can be seen in the following commit

https://gerrit.libreoffice.org/c/core/+/188897

A Report could now display

  • A Field (Database information)
  • A TextBox
  • Output from Formulas that uses database information as input
  • Vertical/Horizontal lines
  • Fonts/Font modifications(colors, bold, etc.)
  • Labels
  1. Implement Testing

The ReportBuilder directory contained no modern unit tests. After talking it over with my mentors we decided it would be useful to set up at least some form of testing that could then be used and extended at a later date.

There was an old ReportBuilder Java test in the qa directory, and this ported over to C++ as well as the the basic testing infrastructure so that more extensive tests could be written down the road.

https://gerrit.libreoffice.org/c/core/+/189436

  1. Implement Grouping Functionality

In the discovery phase of the project it was determined that Pentaho performed the Grouping functionality in a Report. This seemed to be a major reason why Pentaho was used here.

The Grouping functionality allows the user to order the data in a report by criteria such as Alphabetical order, Date or Nominal order and many other options. Having mapped and investigated all the ways that the data can be sorted and grouped, work on the implementation commenced. A major concern was that it would be complex enough to implement that it would take too much time in relation to the total time allotted for the project.

In the end the implementation was successful, but did take quite a bit of time and thus delayed the potential completion of the project. The commits for this can be seen below. Note, these commits have been squashed into the larger commit that covers all of the changes in xmlExport2.

Implement Sorting
https://gerrit.libreoffice.org/c/core/+/189144/3
Implement Grouping
https://gerrit.libreoffice.org/c/core/+/191045/3

  1. Implement switch for C++ or Pentaho ReportBuilder

At this point in the project, much of the functionality of ReportBuilder has been implemented in C++, and with the project’s allotted time winding down it was decided that a nice feature to have would be a way for the user to switch between using the C++ version of ReportBuilder and the Pentaho version of ReportBuilder. Then, when the feature is ready to be rolled out to users, it can be experimental at first, to allow testing before it will be included in the next major release. This is implemented in the following commit

https://gerrit.libreoffice.org/c/core/+/188896862a8f86a2be


List of Features and Their Implementation State

Many of the features that are currently implemented with Pentaho have been implemented in C++. In order to demonstrate what has been done, and what remains to be done, please see the following table that lists each feature and its implementation state. Note, every effort has been made to make this table as complete as possible, but there could be some features not listed below. See Appendix C below for an explanation of where these features are in the ReportBuilder interface.

Feature Implemented Notes
Labels Yes xmlExport2; ORptExport2::handleTextElement
TextBox - Text Yes xmlExport2; ORptExport2::handleTextElement
TextBox - Field Yes xmlExport2; ORptExport2::exportGroup2 ORptExport2::handleTextElement
TextBox - Formula Partially xmlExport2; ORptExport2::handleTextElement ORptExport2::exportFormula
TextBox - Function No Next to be implemented
TextBox - Counter No To be implemented witFunction
TextBox - User Defined Functions Yes Shortcut for previously used functions
Image No Needs investigation as to how it functions, works when data is set to field
Chart No
Horizontal Line Yes xmlExport
Vertical Line Yes xmlExport
Field Yes xmlExport2; ORptExport2::handleTextElement
Sorting & Grouping - Sorting Yes xmlExport2; ORptExport2::getResultSet
Sorting & Grouping - Group Header/Footer Yes xmlExport2; ORptExport2::exportGroup2
Sorting & Grouping - GroupOn Yes xmlExport2; ORptExport2::findSubGroups
Sorting & Grouping - Group Interval Yes xmlExport2; ORptExport2::findSubGroups
Sorting & Grouping - Keep Together No Not sure
Execute Report Yes Addressed in https://gerrit.libreoffice.org/c/core/+/188899
Font Type/Size Yes xmlExport
Bold, Italic, Underline Yes xmlExport
Align - Left/Right, Justify Yes xmlExport
Font Color Yes xmlExport
Background Color Yes xmlExport
(Group) Header/Footer/Detail - Force New Page No
(Group) Header/Footer/Detail - Keep Together No
(Group) Header/Footer/Detail - Repeat Section No What is this?
(Group) Header/Footer/Detail - Visible No
(Group) Header/Footer/Detail - Height Yes xmlExport
(Group) Header/Footer/Detail - Conditional Print Expression No What is this?
(Group) Header/Footer - Background Yes xmlExport
Text Direction No
Visible No
Position X/Y Yes xmlExport
Width/Height Yes xmlExport
Auto Grow No? What is this?
Conditional Print Expression No What is this?
Print Repeated Values No What is this?
Background Yes xmlExport
Font Yes xmlExport
Horz/Vert Alignment No
Formatting No i.e. Date/Currency, etc.

Current State of the Code

The work performed for this project can be found in the following related commits

Extend ReportFormula [WIP]
https://gerrit.libreoffice.org/c/core/+/188897

Subclass ORptExport in xmlExport2 [WIP]
https://gerrit.libreoffice.org/c/core/+/188898

Add Env variable to switch between Pentaho/C++ ReportBuilder [WIP]
https://gerrit.libreoffice.org/c/core/+/188896

Implement Report Save path and Execute path [WIP]
https://gerrit.libreoffice.org/c/core/+/188899

Some notes

  • The commits are marked as Work In Progress. They are not ready for code review yet since not all of the features of ReportBuilder have been implemented in C++ and the implementation of new features may change some of the code.
  • The commits, excluding SubClass ORptExport in xmlExport2(188898), are complete, but they depend on the implementation of ReportBuilder in C++. Since these commits depend on xmlExport2, they will be merged when xmlExport2 is ready.
  • Building these commits locally results in a working Report Builder that is completely functional, but doesn’t support some features. For instance, dates are currently shown in epoch time, i.e. they are not formatted correctly. For a list of what features have and have not been implemented see the table here.
  • Most of the remaining work should be done in xmlExport2. Any work done to xmlExport2 should be amended to https://gerrit.libreoffice.org/c/core/+/188898.

Prioritized List of Next Steps

  1. Finish the Group Functions; Minimum, Maximum, Count, Accumulate. This is the piece of code that needs to query the database. The rest of the features that need to be implemented concern formatting of the .odt document.
  2. Compile with clang and address any issues in the code produced up until this point.
  3. Some parts of the code will be ready for preliminary review at this point.
  4. Write tests for the features that have been implemented at this point. This will ensure
  5. Implement the features listed as not implemented in the table above. As mentioned in 1., these mainly concern formatting the .odt document.
  6. Begin Code Review.
  7. Improve testing so that it proves the C++ version of ReportBuilder output matches the output of the Pentaho version.
  8. Merge :-)
  9. There is an option in ReportBuilder to output a Report as a Calc document. This project did not look into how Pentaho is involved in this process. In order to finally remove Pentaho from ReportBuilder, this needs to be addressed.

Conclusion

Working on this project has been a challenging but rewarding experience. One of the major challenges faced was that there wasn’t a ReportBuilder “expert” that could explain how certain parts of the code functioned, or knew why parts of the code were even there. This made progress slow and could be at times frustrating, especially in the beginning of the project when the code was unfamiliar and little progress had been made. In the end I learned the code in ReportBuilder to a depth that I would not have if I could just ask someone why this function was here, or how a certain class operates, and now, some may say that I’m that expert.

One downside to not having an expert was that all that learning took time, and it took time away from implementing the goal of the project, a C++ filter to replace the Pentaho Report Engine. Although the project is not complete I’d like to consider the project a success since the work that I have done demonstrates that the solution proposed in my commits is a replacement for Pentaho. Most of the functionality that Pentaho was performing has been implemented in C++. A lot of the difficult work has been done, there are lots of smaller features that need to be implemented but they can be tackled independently. I hope that what has been done and what remains to be done has been clearly laid out in this document so that I or any other developer can finish the project in the near future.

I’d like to thank my mentors, Michael and Hossein for all their help and time. It was great working with you, and I look forward to continuing to work with you in the coming months as I work on completing the project.

Appendix A - ReportBuilder Internal Model

Internally, ReportBuilder uses the ReportDefinition class to model a Report. The ReportDefinition contains all of the information that is contained in a report and information about the underlying database. In short a ReportDefinition object contains all the necessary information to output a Report document.

The following chart is a simple explanation of how this works, given here as a guide for anyone working on ReportBuilder.

Report Builder Internal Model

Appendix B - Output Paths For ReportBuilder

There are two output paths for a Report

  1. Saving a Report
  2. Executing a Report

The following diagram shows the process of how each path works internally in ReportBuilder. It also shows how and where Pentaho is called as well as where Pentaho gets its input from and what the output from Pentaho is.

Some notes

  1. The input for Pentaho looks similar to the output of a saved Report, it contains a contents.xml, settings.xml, meta.xml and a styles.xml, which is essentially a complete .odt document.
  2. The difference between the input for Pentaho and the output of an executed Report is that the input xml contains rpt tags that are used to give Pentaho instructions on how and what to process for the output file, which contains no rpt tags.
  3. The code that writes these rpt tags is based in xmlExport.cxx
  4. The xml from a Saved Report is used as input to build the internal model of a Report, see Appendix A for details about the internal model.
  5. Because the output from a Saved Report is used as input for the internal ReportBuilder model it must not be altered, and so xmlExport.cxx must not be altered.
  6. In order to replace Pentaho, we need another “export” filter, and this is the xmlExport2.cxx that this project has implemented here
    https://gerrit.libreoffice.org/c/core/+/188898/1

Output Paths for ReportBuilder

Appendix C - ReportBuilder Interface

Reproduced from here, added for convenience when viewing the table in
List of Features and Their Implementation State

enter image description here

A - Add fields, labels, images, charts and lines to the report.

B - Field, Report Navigator, and Sorting and Grouping, and Execute Report.

C - Page Header, Detail, and Page Footer.

D - Fields dialog - lists the columns from the selected database.

E - Report Navigator

F - Sorting and Grouping, the fields are selected on the top and the grouping and sorting options for the selected field are below.

G - Properties - such as Positioning, page breaks, printing options, etc. Dependent on what type of Report Element is selected.

https://wiki.openoffice.org/wiki/SUN_Report_Builder
Official Open Office Documentation for ReportBuilder

https://wiki.openoffice.org/wiki/SUN_Report_Builder/Documentation
More in depth Open Office ReportBuilder Documentation

https://nextcloud.documentfoundation.org/s/qjFkGwpEEkNrt6f
Old Base guide with some good documentation of features. 500+ pages of documentation.

https://wiki.documentfoundation.org/Documentation/SDKGuide/Functions_and_Data_Analysis#1._Calling_Calc_Functions_from_Code
Documentation involving the Calc code used to parse and execute functions in xmlExport2.cxx

https://documentation.libreoffice.org/assets/Uploads/Documentation/en/BG7.2/BG72-BaseGuide.pdf
Another more modern but still old Base guide with useful information.

Written with StackEdit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment