MuHe03/GSoC Final Report.md

## GSoC Final Report.md

      
    Raw
  

              GSoC Final Report.md
            
          
    Final Report

Student: Mu He
Mentor: Andrey Piskunov
Project: SIMD for SQL expressions and functions
Project Summary: The existing MariaDB ColumnStore engine lacks support for vectorized evaluation of SQL functions and expressions. This project aims to assess the performance gains achieved through the application of Single Instruction, Multiple Data (SIMD). Additionally, the project employs the Arrow framework to store column-wise data in a buffer, facilitating the further application of SIMD in evaluations.
Efforts been done

Benchmarking of SIMD application simulation

According Pull Request

mariadb-corporation/mariadb-columnstore-engine#2943
What I did

During the initial phase, my primary focus was on benchmarking the performance of vectorized evaluations. To establish a set of criteria, I utilized the query 2 * t1.col1 + t1.col2 with two integer values, spanning various data types (tinyint, smallint, int, bigint). This was done to compare performance across different levels of optimization via vectorization. As a baseline, a micro-benchmark was created using Google Bench for the existing scalar implementation.
Next, I implemented vectorized support for SIMD throughout the entire processing path. Specifically, the following tasks were completed:
1. Introduced SIMD interfaces in the TreeNode and ParseTree classes.
2. Developed specialized SIMD-enabled structures for subclasses, including ArithmeticColumn, SimpleColumn, ConstantColumn, and ArithmeticOperator within TreeNode. 
3. Implemented arithmetic operations (addition, subtraction, multiplication) in the SIMD processor using SSE extensions.
4. Inserted rudimentary Row-Column-Transformation mechanisms before and after the SIMD-enabled evaluation.

Additionally, I extended the micro-benchmark to include the vectorized version of the evaluation.

Key Observations from Micro-Benchmarking:


Time Consumption in Transformation
Including the Row-Column-Transformation within the evaluation showed that this phase is the primary time-consuming component, rather than the evaluation itself. This suggests that adopting a column-wise storage format is crucial for effectively leveraging SIMD in evaluations.


Vectorization is Beneficial
The data clearly demonstrates the advantages of vectorized operations over scalar ones, especially for smaller byte sizes (1-byte and 2-byte).


Diminishing Returns
The performance gains appear to decrease as the byte size increases. In the case of 8-byte evaluations, vectorization even showed slightly worse performance than the non-vectorized operation. This is due to the increased overhead or limitations of the SIMD architecture for handling larger data sizes.


Necessity for Rapid Column Location
When considering just the evaluation portion, the current approach of using std::lower_bound to locate columns incurs significant overhead. A more efficient method for quickly identifying the location of columns could further improve performance.


Arrow-Buffer for Column-Wise Stored Data

According Pull Request

mariadb-corporation/mariadb-columnstore-engine#2944
What I did

In the second phase of the project, I pivoted to creating a buffer designed for column-wise data storage, aiming to facilitate direct use in vectorized evaluations. After consulting with my mentor, we opted for Apache Arrow over std::vector due to its superior benefits and robust features.
Subsequently, I generated arrow::Buffer for each memory block, each associated with a unique Logical Block ID to indicate different columns. These buffers stored either scanned or filtered data. After completing each new scan or filter operation on a block, an arrow::Array was created within the buffer. In this setup, only the data value and row ID were buffered.
Future Features

To expand the capabilities of the evaluation application, the next steps will focus on incorporating column-wise data storage. Specifically, the Apache Arrow buffer infrastructure should be integrated into both the evaluation and transformation components, extending to the RowGroup class. This will also pave the way for enhanced vectorized support based on Arrow-buffered data.
Acknowledgement

I extend my heartfelt gratitude to my mentors, Andrey Piskunov and Roman Nozdrin, for their guidance, technical support, and expertise throughout the duration of this project. Their meticulous approach to mentoring has illuminated the path for me, offering step-by-step guidance that has been instrumental in the successful completion of the project. My skill set in programming has significantly expanded, and the hands-on experience I've acquired during the Google Summer of Code has been invaluable. The challenges and experiences afforded to me in this endeavor have had a profound impact, serving as pivotal milestones for my personal growth and future career development.