Velox is a C++ database acceleration library that can be integrated with Spark or Presto to enhace query performance and reduce infrastructure costs. It includes a custom c++ Parquet reader for better performance and integration. During testing, errors related to the native Parquet reader were discovered, highlighting the need for an improved testing infrastructure to catch all issues and enhance testing coverage before production.
The proposal is to leverage the existing unit tests in the Presto project to test the Velox Parquet reader as an interim solution. The reasons for choosing this approach are as follows:
- The unit tests in presto is comprehensive, covering all data types, including complex types like struct, array, and map. Each unit test includes forward ordering, backward ordering, and randomly inserted nulls testing. To facilitate complex type testing, A custom hive parquet writer (e.g, SingleLevelArraySchemaConverter) is used to surface issues. (this is the reason why #7002 is blocked at first because it's hard to reproduce)
- Extensibility: Each unit test verifies a pair of writer and reader, such as native Parquet writer and native Parquet reader or Hive Parquet writer and native Parquet reader. The unit test is easy to extend when implementing native parquet writer in velox.
- Better Debuggability: The unit test inputs can be examined to understand failures, and the native reader in Presto can be referred to understand the correct logic.
- Simple Implementation: It is estimated that implementing this solution will take approximately one week. Backporting all testing infrastructure from Presto in Java to Velox in C++ could be time-consuming.
- Implement a Parquet reader in the Velox project that takes a Parquet file as input and outputs a binary file with data in SerializedPage format. Draft PR
- In the Presto project, within the unit test in AbstractParquetReader, create a new process and call the Velox Parquet reader with a Parquet file input. Read the generated output file, decode it back into a Page, and compare the data against expected results. Draft PR
- Checkout https://github.com/qqibrow/velox/tree/new_test_base and build velox. expect output binary:
{velox_base_dir}/_build/debug/velox/dwio/parquet/tests/reader/velox_scan_parquet
- Checkout https://github.com/qqibrow/presto/tree/velox_parquet_test_base
- Run all tests:
./mvnw -Dtest=TestParquetReader test -B -Dair.check.skip-all -Dmaven.javadoc.skip=true -DLogTestDurationListener.enabled=true -Dvelox_parquet_reader_path={velox_base_dir}/_build/debug/velox/dwio/parquet/tests/reader/velox_scan_parquet -Dfailed_parquet_files_dir=/tmp/velox_test_data -pl :presto-hive
- Run one test:
./mvnw -Dtest=TestParquetReader#testStruct test -B -Dair.check.skip-all -Dmaven.javadoc.skip=true -DLogTestDurationListener.enabled=true -Dvelox_parquet_reader_path={velox_base_dir}/_build/debug/velox/dwio/parquet/tests/reader/velox_scan_parquet -Dfailed_parquet_files_dir=/tmp/velox_test_data -pl :presto-hive
-Dfailed_parquet_files_dir=/tmp/velox_test_data
is dir to store test parquet files that fails the test.
[INFO] Results:
[INFO]
[ERROR] Failures:
[ERROR] TestParquetReader.testArray:34->AbstractTestParquetReader.testArray:157 expected [[1]] but found [[null]]
[ERROR] TestParquetReader.testArrayOfArrayOfStructOfArray:83->AbstractTestParquetReader.testArrayOfArrayOfStructOfArray:267 exitCode should be 0 expected [0] but found [134]
[ERROR] TestParquetReader.testArrayOfMapOfArray:167->AbstractTestParquetReader.testArrayOfMapOfArray:455 exitCode should be 0 expected [0] but found [134]
[ERROR] TestParquetReader.testArrayOfMapOfStruct:139->AbstractTestParquetReader.testArrayOfMapOfStruct:385 exitCode should be 0 expected [0] but found [134]
[ERROR] TestParquetReader.testArrayOfMaps:125->AbstractTestParquetReader.testArrayOfMaps:361 expected [[{0=0, 1=1}]] but found [[null]]
[ERROR] TestParquetReader.testArrayOfStructOfArray:97->AbstractTestParquetReader.testArrayOfStructOfArray:304 exitCode should be 0 expected [0] but found [134]
[ERROR] TestParquetReader.testArrayOfStructs:62->AbstractTestParquetReader.testArrayOfStructs:210 exitCode should be 0 expected [0] but found [134]
[ERROR] TestParquetReader.testArraySchemas:468->AbstractTestParquetReader.testArraySchemas:1252 expected [[19, 20, 21, 22]] but found [[null, 19, 20, 21]]
[ERROR] TestParquetReader.testComplexNestedStructs:230->AbstractTestParquetReader.testComplexNestedStructs:650 » IllegalState Map key is null at position: 22
[ERROR] TestParquetReader.testCustomSchemaArrayOfStructs:69->AbstractTestParquetReader.testCustomSchemaArrayOfStructs:236 exitCode should be 0 expected [0] but found [134]
[ERROR] TestParquetReader.testDoubleSequence:503->AbstractTestParquetReader.testDoubleSequence:1456 exitCode should be 0 expected [0] but found [134]
[ERROR] TestParquetReader.testLongDirect:335->AbstractTestParquetReader.testLongDirect:847->AbstractTestParquetReader.testRoundTripNumeric:1408 » UnsupportedOperation com.facebook.presto.common.block.ByteArrayBlock
[ERROR] TestParquetReader.testLongDirect2:342->AbstractTestParquetReader.testLongDirect2:859->AbstractTestParquetReader.testRoundTripNumeric:1408 » UnsupportedOperation com.facebook.presto.common.block.ByteArrayBlock
[ERROR] TestParquetReader.testLongPatchedBase:356->AbstractTestParquetReader.testLongPatchedBase:873->AbstractTestParquetReader.testRoundTripNumeric:1408 » UnsupportedOperation com.facebook.presto.common.block.ByteArrayBlock
[ERROR] TestParquetReader.testLongSequence:321->AbstractTestParquetReader.testLongSequence:833->AbstractTestParquetReader.testRoundTripNumeric:1408 » UnsupportedOperation com.facebook.presto.common.block.ByteArrayBlock
[ERROR] TestParquetReader.testLongSequenceWithHoles:328->AbstractTestParquetReader.testLongSequenceWithHoles:840->AbstractTestParquetReader.testRoundTripNumeric:1408 » UnsupportedOperation com.facebook.presto.common.block.ByteArrayBlock
[ERROR] TestParquetReader.testLongShortRepeat:349->AbstractTestParquetReader.testLongShortRepeat:866->AbstractTestParquetReader.testRoundTripNumeric:1408 » UnsupportedOperation com.facebook.presto.common.block.ByteArrayBlock
[ERROR] TestParquetReader.testLongStrideDictionary:482->AbstractTestParquetReader.testLongStrideDictionary:1402->AbstractTestParquetReader.testRoundTripNumeric:1408 » UnsupportedOperation com.facebook.presto.common.block.ByteArrayBlock
[ERROR] TestParquetReader.testMap:111->AbstractTestParquetReader.testMap:334 » IllegalState Map key is null at position: 0
[ERROR] TestParquetReader.testMapOfArrayKeys:188->AbstractTestParquetReader.testMapOfArrayKeys:497 exitCode should be 0 expected [0] but found [134]
[ERROR] TestParquetReader.testMapOfArrayValues:181->AbstractTestParquetReader.testMapOfArrayValues:484 exitCode should be 0 expected [0] but found [134]
[ERROR] TestParquetReader.testMapOfSingleLevelArray:195->AbstractTestParquetReader.testMapOfSingleLevelArray:513 exitCode should be 0 expected [0] but found [134]
[ERROR] TestParquetReader.testMapOfStruct:202->AbstractTestParquetReader.testMapOfStruct:528 exitCode should be 0 expected [0] but found [134]
[ERROR] TestParquetReader.testMapSchemas:475->AbstractTestParquetReader.testMapSchemas:1303 » IllegalState Map key is null at position: 0
[ERROR] TestParquetReader.testMapWithNullValues:209->AbstractTestParquetReader.testMapWithNullValues:541 » IllegalState Map key is null at position: 0
[ERROR] TestParquetReader.testNestedArrays:48->AbstractTestParquetReader.testNestedArrays:182 exitCode should be 0 expected [0] but found [134]
[ERROR] TestParquetReader.testNestedMaps:118->AbstractTestParquetReader.testNestedMaps:352 exitCode should be 0 expected [0] but found [134]
[ERROR] TestParquetReader.testNewAvroArray:461->AbstractTestParquetReader.testNewAvroArray:1233 expected [[1]] but found [[null]]
[ERROR] TestParquetReader.testOldAvroArray:454->AbstractTestParquetReader.testOldAvroArray:1218 expected [[10]] but found [[null]]
[ERROR] TestParquetReader.testSchemaWithRepeatedOptionalRequiredFields:398->AbstractTestParquetReader.testSchemaWithRepeatedOptionalRequiredFields:995 exitCode should be 0 expected [0] but found [134]
[ERROR] TestParquetReader.testSingleLevelArrayOfMapOfArray:174->AbstractTestParquetReader.testSingleLevelArrayOfMapOfArray:470 exitCode should be 0 expected [0] but found [134]
[ERROR] TestParquetReader.testSingleLevelArrayOfMapOfStruct:146->AbstractTestParquetReader.testSingleLevelArrayOfMapOfStruct:402 exitCode should be 0 expected [0] but found [134]
[ERROR] TestParquetReader.testSingleLevelArrayOfStructOfSingleElement:153->AbstractTestParquetReader.testSingleLevelArrayOfStructOfSingleElement:417 exitCode should be 0 expected [0] but found [134]
[ERROR] TestParquetReader.testSingleLevelArrayOfStructOfStructOfSingleElement:160->AbstractTestParquetReader.testSingleLevelArrayOfStructOfStructOfSingleElement:437 exitCode should be 0 expected [0] but found [134]
[ERROR] TestParquetReader.testSingleLevelSchemaArrayOfArrayOfStructOfArray:90->AbstractTestParquetReader.testSingleLevelSchemaArrayOfArrayOfStructOfArray:286 exitCode should be 0 expected [0] but found [134]
[ERROR] TestParquetReader.testSingleLevelSchemaArrayOfMaps:132->AbstractTestParquetReader.testSingleLevelSchemaArrayOfMaps:372 » IllegalState Map key is null at position: 0
[ERROR] TestParquetReader.testSingleLevelSchemaArrayOfStructOfArray:104->AbstractTestParquetReader.testSingleLevelSchemaArrayOfStructOfArray:321 exitCode should be 0 expected [0] but found [134]
[ERROR] TestParquetReader.testSingleLevelSchemaArrayOfStructs:76->AbstractTestParquetReader.testSingleLevelSchemaArrayOfStructs:254 exitCode should be 0 expected [0] but found [134]
[ERROR] TestParquetReader.testSingleLevelSchemaNestedArrays:55->AbstractTestParquetReader.testSingleLevelSchemaNestedArrays:199 expected [[[]]] but found [[null]]
[ERROR] TestParquetReader.testSmallIntSequence:314->AbstractTestParquetReader.testSmallIntSequence:826 » UnsupportedOperation com.facebook.presto.common.block.IntArrayBlock
[ERROR] TestParquetReader.testStruct:216->AbstractTestParquetReader.testStruct:551 exitCode should be 0 expected [0] but found [134]
[ERROR] TestParquetReader.testStructOfArrayAndPrimitive:258->AbstractTestParquetReader.testStructOfArrayAndPrimitive:721 expected [[[13, 14, 15, 16, 17, 18], 13]] but found [[[null, null, null, 13, 14, 15], 13]]
[ERROR] TestParquetReader.testStructOfMaps:237->AbstractTestParquetReader.testStructOfMaps:666 » IllegalState Map key is null at position: 10
[ERROR] TestParquetReader.testStructOfNullableArrayBetweenNonNullFields:251->AbstractTestParquetReader.testStructOfNullableArrayBetweenNonNullFields:704 expected [[1, [null, value2, value3], 1]] but found [[1, [null, null, value2], 1]]
[ERROR] TestParquetReader.testStructOfNullableMapBetweenNonNullFields:244->AbstractTestParquetReader.testStructOfNullableMapBetweenNonNullFields:685 » IllegalState Map key is null at position: 20
[ERROR] TestParquetReader.testStructOfPrimitiveAndArray:272->AbstractTestParquetReader.testStructOfPrimitiveAndArray:748 expected [[11, [2, 3]]] but found [[11, [null, 2]]]
[ERROR] TestParquetReader.testStructOfPrimitiveAndSingleLevelArray:279->AbstractTestParquetReader.testStructOfPrimitiveAndSingleLevelArray:762 expected [[3, [0]]] but found [[3, [null]]]
[ERROR] TestParquetReader.testStructOfSingleLevelArrayAndPrimitive:265->AbstractTestParquetReader.testStructOfSingleLevelArrayAndPrimitive:734 exitCode should be 0 expected [0] but found [134]
[ERROR] TestParquetReader.testStructOfTwoArrays:286->AbstractTestParquetReader.testStructOfTwoArrays:776 expected [[[2], [1, 3, 5, 7]]] but found [[[2], [null, 1, 3, 5]]]
[ERROR] TestParquetReader.testStructOfTwoNestedArrays:293->AbstractTestParquetReader.testStructOfTwoNestedArrays:789 exitCode should be 0 expected [0] but found [134]
[ERROR] TestParquetReader.testStructOfTwoNestedSingleLevelSchemaArrays:300->AbstractTestParquetReader.testStructOfTwoNestedSingleLevelSchemaArrays:810 exitCode should be 0 expected [0] but found [134]
[ERROR] TestParquetReader.testTimestampMicrosBackedByINT64:370->AbstractTestParquetReader.testTimestampMicrosBackedByINT64:909 exitCode should be 0 expected [0] but found [134]
[ERROR] TestParquetReader.testTimestampMillisBackedByINT64:377->AbstractTestParquetReader.testTimestampMillisBackedByINT64:922 exitCode should be 0 expected [0] but found [134]
[INFO]
[ERROR] Tests run: 82, Failures: 53, Errors: 0, Skipped: 0
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 30.736 s
[INFO] Finished at: 2023-11-21T00:31:52Z
[INFO] ------------------------------------------------------------------------
We are triaging the failures and will open issues in velox community.
Latest Update:
The number of failure tests has reduced from 53 to 13. current result:
Findings: