This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Environment: | |
CDP 7.1.5 | |
6 nodes (Dell PowerEdge R430, 20c/40t Xeon e5-2630 v4 @ 2.2Ghz, 128GB Ram, 4-2TB disks) | |
1) generate big table (260M) with all random data | |
2) copy big table to parquet | |
3) generate small table with top 1000 and bottom 1000 keys off big one | |
4) generate small table with top 1000 and bottom 1000 of non-key field off big one | |
5) compute stats for all tables | |
6) select big kudu based on half of small (filter by some int field mod 2), joining on key |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
static KuduClient GetTicketCacheClient() { | |
Subject subject = SecurityUtil.getSubjectFromTicketCacheOrNull(); | |
if (subject == null) { | |
System.out.println("Subject not available from ticket cache"); | |
System.exit(1); | |
} | |
KuduClient client = null; | |
try { | |
client = Subject.doAs(subject, new PrivilegedExceptionAction<KuduClient>() { | |
public KuduClient run() throws Exception { |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import org.apache.hadoop.security.UserGroupInformation; | |
import org.apache.kudu.client.CreateTableOptions; | |
import org.apache.kudu.client.KuduClient; | |
import org.apache.kudu.client.KuduClient.KuduClientBuilder; | |
import org.apache.kudu.ColumnSchema.ColumnSchemaBuilder; | |
import org.apache.kudu.Schema; | |
import org.apache.kudu.Type; | |
import org.apache.kudu.client.KuduException; | |
import org.apache.kudu.client.ListTablesResponse; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
+----------+--------------------+---------+------------+------------+----------------+ | |
| Workload | File Format | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) | | |
+----------+--------------------+---------+------------+------------+----------------+ | |
| TPCH(30) | kudu / none / none | 13.85 | -28.89% | 8.79 | -34.71% | | |
+----------+--------------------+---------+------------+------------+----------------+ | |
+----------+----------+--------------------+--------+-------------+------------+------------+----------------+-------+----------------+---------+---------+ | |
| Workload | Query | File Format | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Iters | Median Diff(%) | MW Zval | Tval | | |
+----------+----------+--------------------+--------+-------------+------------+------------+----------------+-------+----------------+---------+---------+ | |
| TPCH(30) | TPCH-Q9 | kudu / none / none | 41.61 | 40.81 | +1.95% | * 16.94% * | * 14.71% * | 5 | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
block_cache_capacity_mb = 256 (default) | |
+----------+--------------------+---------+------------+------------+----------------+ | |
| Workload | File Format | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) | | |
+----------+--------------------+---------+------------+------------+----------------+ | |
| TPCH(30) | kudu / none / none | 15.84 | -5.70% | 9.93 | -13.69% | | |
+----------+--------------------+---------+------------+------------+----------------+ | |
+----------+----------+--------------------+--------+-------------+------------+------------+----------------+-------+----------------+---------+---------+ | |
| Workload | Query | File Format | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Iters | Median Diff(%) | MW Zval | Tval | | |
+----------+----------+--------------------+--------+-------------+------------+------------+----------------+-------+----------------+---------+---------+ | |
| TPCH(30) | TPCH-Q9 | kudu / none / none | 62.87 | 31.97 | R +96.64% | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
+----------+--------------------+---------+------------+------------+----------------+ | |
| Workload | File Format | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) | | |
+----------+--------------------+---------+------------+------------+----------------+ | |
| TPCH(30) | kudu / none / none | 12.53 | -21.67% | 8.44 | -23.04% | | |
+----------+--------------------+---------+------------+------------+----------------+ | |
+----------+----------+--------------------+--------+-------------+------------+------------+----------------+-------+----------------+---------+---------+ | |
| Workload | Query | File Format | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Iters | Median Diff(%) | MW Zval | Tval | | |
+----------+----------+--------------------+--------+-------------+------------+------------+----------------+-------+----------------+---------+---------+ | |
| TPCH(30) | TPCH-Q15 | kudu / none / none | 5.98 | 4.37 | +37.03% | * 51.70% * | * 45.08% * | 5 | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
4.2.1 TPCH-Q9 SQL Statement: | |
select nation, o_year, sum(amount) as sum_profit | |
from (select n_name as nation, extract(year from o_orderdate) as o_year, | |
l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity as amount | |
from part, supplier, lineitem, partsupp, orders, nation | |
where s_suppkey = l_suppkey and ps_suppkey = l_suppkey and ps_partkey = l_partkey | |
and p_partkey = l_partkey and o_orderkey = l_orderkey and s_nationkey = n_nationkey | |
and p_name like '%:1%') | |
as profit group by nation, o_year order by nation, o_year desc LIMIT 1; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Modification of codec-test.py from Todd Lipcon to be python3 compatible and some formatting of output. | |
# https://github.infra.cloudera.com/raw/todd/experiments/master/kudu/codec-test.py | |
import pyfastpfor | |
import numpy as np | |
import pandas as pd | |
from timeit import Timer | |
import bitshuffle | |
import sys | |
from prettytable import PrettyTable |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import random | |
import sys | |
# 10M values to be generated | |
count = 10 * 1024 * 1024 | |
def gen_repeat_in_small_range(): | |
for i in range(0, int(count/256/256)): | |
for j in range(0, 256): | |
for k in range(0, 256): |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The _for128 and _for256 basically uses blocks of 128/256 input integers to calculate the diff | |
and min across the block simulating the mechanism used in Kudu's encoding implementation. | |
$ python codec-test.py repeat_small_range.csv | |
+--------------------------+----------------------+------------------------+--------------+ | |
| codec | comp_time(millisecs) | decomp_time(millisecs) | bits_per_int | | |
+--------------------------+----------------------+------------------------+--------------+ | |
| bitshuffle | 42.511 | 165.078 | 0.4868 | | |
| simdbinarypacking | 33.127 | 34.271 | 7.0821 | |
NewerOlder