Skip to content

Instantly share code, notes, and snippets.

@qqibrow
qqibrow / test_velox_parquet_use_presto.md
Last active March 28, 2024 20:16
Test velox parquet reader using parquet unit tests in presto

Test Velox Parquet Reader Using Presto

Background

Velox is a C++ database acceleration library that can be integrated with Spark or Presto to enhace query performance and reduce infrastructure costs. It includes a custom c++ Parquet reader for better performance and integration. During testing, errors related to the native Parquet reader were discovered, highlighting the need for an improved testing infrastructure to catch all issues and enhance testing coverage before production.

Proposal

The proposal is to leverage the existing unit tests in the Presto project to test the Velox Parquet reader as an interim solution. The reasons for choosing this approach are as follows:

@qqibrow
qqibrow / gist:f297babadb0bb662ee398b9088870785
Created April 16, 2021 19:36
Reproduce Iterator Operator Checkpoint Issue in 1.11
package com.example.demo;
import java.util.ArrayList;
import java.util.List;
import java.util.Random;
import org.apache.flink.api.common.functions.MapFunction;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.api.java.tuple.Tuple5;
import org.apache.flink.api.java.utils.ParameterTool;
2020-11-04 18:39:39,854 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Triggering checkpoint 6917 @ 1604515179567 for job 47d07d9ba88330da5940b96d82c0e5b1.
2020-11-04 18:42:07,846 WARN org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 48188ms for sessionid 0x200572cda940ebb
2020-11-04 18:42:07,846 INFO org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 48188ms for sessionid 0x200572cda940ebb, closing socket connection and attempting reconnect
2020-11-04 18:42:07,857 INFO org.apache.flink.yarn.YarnResourceManager - The heartbeat of JobManager with id 19b2149e615087e5aeb8fcdb3f6d8daf timed out.
2020-11-04 18:42:07,857 INFO org.apache.flink.yarn.YarnResourceManager - Disconnect job manager 89ebd520ca11b33c2d37d55295e04f33@akka.tcp://flink@flinkhost-data-slave-prod-0a021443.ec2.pin220.com:35779/user/j
@qqibrow
qqibrow / 180_explain_result
Created July 25, 2017 23:36
180 explain result
"Fragment 0 [SINGLE]
Output layout: [simplified_shape, expr_22, count]
Output partitioning: SINGLE []
- Output[simplified_shape, CashRate, Trips] => [simplified_shape:varchar, expr_22:double, count:bigint]
CashRate := expr_22
Trips := count
- RemoteSource[1] => [expr_22:double, count:bigint, simplified_shape:varchar]
Fragment 1 [HASH]
Output layout: [expr_22, count, simplified_shape]
@qqibrow
qqibrow / 158_explain_result
Created July 25, 2017 23:33
158 explain plan
"Fragment 0 [SINGLE]
Output layout: [simplified_shape, expr_22, count]
Output partitioning: SINGLE []
- Output[simplified_shape, CashRate, Trips] => [simplified_shape:varchar, expr_22:double, count:bigint]
CashRate := expr_22
Trips := count
- RemoteSource[1] => [simplified_shape:varchar, expr_22:double, count:bigint]
Fragment 1 [HASH]
Output layout: [simplified_shape, expr_22, count]
@qqibrow
qqibrow / task_3_28.log
Created July 25, 2017 00:04
task_3_28.log
{
"taskStatus" : {
"taskId" : "20170718_180404_24350_4h2d6.3.28",
"taskInstanceId" : "acb0c304-695d-4cae-9e40-221da8054d32",
"version" : 12349,
"state" : "RUNNING",
"self" : "http://10.65.8.7:8080/v1/task/20170718_180404_24350_4h2d6.3.28",
"failures" : [ ],
"queuedPartitionedDrivers" : 0,
"runningPartitionedDrivers" : 1,
@qqibrow
qqibrow / 0_reuse_code.js
Created April 20, 2014 04:27
Here are some things you can do with Gists in GistBox.
// Use Gists to store code you would like to remember later on
console.log(window); // log the "window" object to the console
// Sol1.
int stringToLong(string str) {
const char* p = str.c_str();
// Skip space in the front.
for(; *p == ' '; ++p);
// Get the sign of the input.
bool sign;
// First solution. This cannot pass larget test.
class Solution {
public:
int uniquePathsWithObstacles(vector<vector<int> > &obstacleGrid) {
if(!obstacleGrid.size() || !obstacleGrid[0].size() || obstacleGrid[0][0])
return 0;
else
return getUniquePath(obstacleGrid.size()-1, obstacleGrid[0].size()-1, obstacleGrid);
}
class Solution {
public:
int uniquePaths(int m, int n) {
if( 0 == m || 0 == n)
return 0;
int path[m][n];
for (int i = 0; i < m; ++i)
for ( int j = 0; j < n; ++j) {
if( 0 == i || 0 == j)