Skip to content

Instantly share code, notes, and snippets.

@bbhavsar
Last active June 2, 2020 18:42
Show Gist options
  • Save bbhavsar/811ccbe0cd144090f82bdabcd801f827 to your computer and use it in GitHub Desktop.
Save bbhavsar/811ccbe0cd144090f82bdabcd801f827 to your computer and use it in GitHub Desktop.
Regression observed with TPCH-Q9 query with Impala when pushing down Bloom filter predicate
4.2.1 TPCH-Q9 SQL Statement:
select nation, o_year, sum(amount) as sum_profit
from (select n_name as nation, extract(year from o_orderdate) as o_year,
l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity as amount
from part, supplier, lineitem, partsupp, orders, nation
where s_suppkey = l_suppkey and ps_suppkey = l_suppkey and ps_partkey = l_partkey
and p_partkey = l_partkey and o_orderkey = l_orderkey and s_nationkey = n_nationkey
and p_name like '%:1%')
as profit group by nation, o_year order by nation, o_year desc LIMIT 1;
4.2.2 Regression details in single_node_perf_run for running TPCH-Q9
(R) Regression: TPCH(30) TPCH-Q9 [kudu / none / none] (32.14s -> 55.88s [+73.84%]) block_cache_capacity_mb = 64
+---------------------+------------+--------+----------+------------+------------+--------+----------+------------+--------+-------+---------+-----------+
| Operator | % of Query | Avg | Base Avg | Delta(Avg) | StdDev(%) | Max | Base Max | Delta(Max) | #Hosts | #Inst | #Rows | Est #Rows |
+---------------------+------------+--------+----------+------------+------------+--------+----------+------------+--------+-------+---------+-----------+
| 09:HASH JOIN | 6.89% | 5.44s | 3.77s | +44.24% | * 21.89% * | 7.46s | 4.33s | +72.27% | 1 | 1 | 9.79M | 17.75M |
| F05:EXCHANGE SENDER | 5.40% | 4.26s | 4.26s | -0.00% | 0.13% | 4.27s | 4.28s | -0.27% | 1 | 1 | -1 | -1 |
| F04:EXCHANGE SENDER | 6.32% | 4.99s | 4.75s | +5.01% | 1.97% | 5.13s | 4.88s | +5.07% | 1 | 1 | -1 | -1 |
| 08:HASH JOIN | 6.97% | 5.50s | 3.85s | +42.89% | * 12.37% * | 6.60s | 4.37s | +50.91% | 1 | 1 | 9.79M | 17.75M |
| F03:EXCHANGE SENDER | 2.44% | 1.92s | 1.91s | +0.70% | 1.06% | 1.95s | 1.92s | +1.39% | 1 | 1 | -1 | -1 |
| F00:EXCHANGE SENDER | 5.80% | 4.58s | 4.75s | -3.53% | 2.26% | 4.71s | 4.92s | -4.43% | 1 | 1 | -1 | -1 |
| 01:SCAN KUDU | 2.71% | 2.14s | 1.94s | +10.45% | * 11.49% * | 2.55s | 1.95s | +30.95% | 1 | 1 | 300.00K | 300.00K |
| 06:HASH JOIN | 2.88% | 2.28s | 13.01s | -82.51% | 7.89% | 2.50s | 13.27s | -81.16% | 1 | 1 | 9.79M | 18.00M |
| 00:SCAN KUDU | 4.75% | 3.75s | 2.59s | +44.74% | * 21.80% * | 4.48s | 2.94s | +52.37% | 1 | 1 | 326.31K | 600.00K |
| 02:SCAN KUDU | 49.35% | 38.94s | 6.12s | +535.88% | * 28.41% * | 49.62s | 10.20s | +386.28% | 1 | 1 | 28.72M | 180.00M |
+---------------------+------------+--------+----------+------------+------------+--------+----------+------------+--------+-------+---------+-----------+
(R) Regression: TPCH(30) TPCH-Q9 [kudu / none / none] (31.97s -> 62.87s [+96.64%]) block_cache_capacity_mb = 256
+---------------------+------------+--------+----------+------------+------------+--------+----------+------------+--------+-------+---------+-----------+
| Operator | % of Query | Avg | Base Avg | Delta(Avg) | StdDev(%) | Max | Base Max | Delta(Max) | #Hosts | #Inst | #Rows | Est #Rows |
+---------------------+------------+--------+----------+------------+------------+--------+----------+------------+--------+-------+---------+-----------+
| 09:HASH JOIN | 6.43% | 5.51s | 3.65s | +51.06% | * 30.17% * | 7.57s | 5.61s | +34.84% | 1 | 1 | 9.79M | 17.75M |
| F05:EXCHANGE SENDER | 4.89% | 4.20s | 4.17s | +0.65% | 1.25% | 4.26s | 4.23s | +0.75% | 1 | 1 | -1 | -1 |
| F04:EXCHANGE SENDER | 5.92% | 5.08s | 5.21s | -2.57% | 3.91% | 5.31s | 5.26s | +0.92% | 1 | 1 | -1 | -1 |
| 08:HASH JOIN | 5.72% | 4.91s | 3.85s | +27.42% | * 16.36% * | 6.13s | 4.90s | +25.21% | 1 | 1 | 9.79M | 17.75M |
| F03:EXCHANGE SENDER | 2.23% | 1.91s | 1.87s | +2.54% | 1.75% | 1.96s | 1.88s | +4.06% | 1 | 1 | -1 | -1 |
| 03:SCAN KUDU | 2.13% | 1.83s | 1.39s | +31.64% | * 69.49% * | 3.81s | 3.49s | +9.37% | 1 | 1 | 24.00M | 24.00M |
| F00:EXCHANGE SENDER | 5.56% | 4.77s | 5.11s | -6.50% | 3.36% | 4.94s | 5.32s | -7.16% | 1 | 1 | -1 | -1 |
| 01:SCAN KUDU | 2.25% | 1.93s | 1.94s | -0.20% | 1.20% | 1.95s | 1.95s | +0.17% | 1 | 1 | 300.00K | 300.00K |
| 06:HASH JOIN | 2.78% | 2.38s | 13.83s | -82.76% | 7.69% | 2.58s | 15.04s | -82.85% | 1 | 1 | 9.79M | 18.00M |
| 00:SCAN KUDU | 5.37% | 4.61s | 4.41s | +4.36% | * 30.47% * | 6.22s | 5.95s | +4.54% | 1 | 1 | 326.31K | 600.00K |
| 02:SCAN KUDU | 51.94% | 44.56s | 4.39s | +915.61% | * 11.84% * | 53.18s | 5.99s | +787.42% | 1 | 1 | 28.72M | 180.00M |
+---------------------+------------+--------+----------+------------+------------+--------+----------+------------+--------+-------+---------+-----------+
(R) Regression: TPCH(30) TPCH-Q9 [kudu / none / none] (46.12s -> 69.67s [+51.06%]) block_cache_capacity_mb = 1024
+---------------------+------------+--------+----------+------------+------------+--------+----------+------------+--------+-------+---------+-----------+
| Operator | % of Query | Avg | Base Avg | Delta(Avg) | StdDev(%) | Max | Base Max | Delta(Max) | #Hosts | #Inst | #Rows | Est #Rows |
+---------------------+------------+--------+----------+------------+------------+--------+----------+------------+--------+-------+---------+-----------+
| 09:HASH JOIN | 3.62% | 3.52s | 2.64s | +33.12% | * 22.03% * | 4.02s | 4.02s | -0.03% | 1 | 1 | 9.79M | 17.75M |
| F05:EXCHANGE SENDER | 4.41% | 4.29s | 4.21s | +1.81% | 0.88% | 4.35s | 4.28s | +1.61% | 1 | 1 | -1 | -1 |
| 04:SCAN KUDU | 2.50% | 2.43s | 1.30s | +86.93% | * 69.80% * | 4.26s | 4.27s | -0.02% | 1 | 1 | 45.00M | 45.00M |
| F04:EXCHANGE SENDER | 5.25% | 5.10s | 5.24s | -2.63% | 2.28% | 5.30s | 5.47s | -3.09% | 1 | 1 | -1 | -1 |
| 08:HASH JOIN | 4.12% | 4.01s | 3.95s | +1.54% | * 26.66% * | 5.17s | 4.25s | +21.49% | 1 | 1 | 9.79M | 17.75M |
| 03:SCAN KUDU | 4.17% | 4.05s | 3.69s | +9.77% | * 48.76% * | 7.20s | 5.54s | +29.96% | 1 | 1 | 24.00M | 24.00M |
| F00:EXCHANGE SENDER | 5.01% | 4.88s | 5.32s | -8.29% | 4.10% | 5.18s | 5.56s | -6.79% | 1 | 1 | -1 | -1 |
| 01:SCAN KUDU | 2.82% | 2.75s | 2.34s | +17.54% | * 25.03% * | 3.80s | 3.52s | +8.18% | 1 | 1 | 300.00K | 300.00K |
| 06:HASH JOIN | 2.49% | 2.43s | 14.09s | -82.79% | 7.43% | 2.62s | 14.90s | -82.43% | 1 | 1 | 9.79M | 18.00M |
| 00:SCAN KUDU | 7.40% | 7.20s | 5.87s | +22.63% | * 23.48% * | 9.99s | 6.96s | +43.44% | 1 | 1 | 326.31K | 600.00K |
| 02:SCAN KUDU | 53.51% | 52.05s | 17.36s | +199.82% | * 18.57% * | 65.64s | 20.31s | +223.21% | 1 | 1 | 28.72M | 180.00M |
+---------------------+------------+--------+----------+------------+------------+--------+----------+------------+--------+-------+---------+-----------+
Major regressions are caused by Kudu scan node: 02:SCAN KUDU.
4.2.3 Profile for running TPCH-Q9 in impala shell with 1 impalad.
Operator #Hosts #Inst Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
-----------------------------------------------------------------------------------------------------------------------------------------
F09:ROOT 1 1 890.000ns 890.000ns 0 0
22:MERGING-EXCHANGE 1 1 22.250us 22.250us 0 1 0 16.00 KB UNPARTITIONED
F08:EXCHANGE SENDER 1 1 32.475us 32.475us 3.45 KB 0
12:TOP-N 1 1 42.080us 42.080us 0 1 12.00 KB 44.00 B
21:AGGREGATE 1 1 610.711us 610.711us 0 61.70K 1.97 MB 10.00 MB FINALIZE
20:EXCHANGE 1 1 8.904us 8.904us 0 61.70K 0 1006.49 KB HASH(nation,o_year)
F06:EXCHANGE SENDER 1 1 29.112us 29.112us 3.45 KB 0
11:AGGREGATE 1 1 0.000ns 0.000ns 0 61.70K 5.08 MB 10.00 MB STREAMING
10:HASH JOIN 1 1 464.450us 464.450us 0 17.75M 2.01 MB 1.94 MB INNER JOIN, BROADCAST
|--19:EXCHANGE 1 1 6.985us 6.985us 25 25 16.00 KB 16.00 KB BROADCAST
| F07:EXCHANGE SENDER 1 1 52.949us 52.949us 6.73 KB 0
| 05:SCAN KUDU 1 1 42.659ms 42.659ms 25 25 65.00 KB 768.00 KB tpch30_kudu.nation
09:HASH JOIN 1 1 10.916ms 10.916ms 0 17.75M 3.47 GB 535.01 MB INNER JOIN, PARTITIONED
|--18:EXCHANGE 1 1 507.893ms 507.893ms 45.00M 45.00M 16.90 MB 10.11 MB HASH(o_orderkey)
| F05:EXCHANGE SENDER 1 1 4s254ms 4s254ms 5.12 KB 0
| 04:SCAN KUDU 1 1 9s220ms 9s220ms 45.00M 45.00M 3.49 MB 3.00 MB tpch30_kudu.orders
17:EXCHANGE 1 1 6.628us 6.628us 0 17.75M 0 10.45 MB HASH(l_orderkey)
F04:EXCHANGE SENDER 1 1 50.371us 50.371us 4.62 KB 0
08:HASH JOIN 1 1 6.089ms 6.089ms 0 17.75M 1.56 GB 201.42 MB INNER JOIN, PARTITIONED
|--16:EXCHANGE 1 1 278.063ms 278.063ms 24.00M 24.00M 12.21 MB 10.08 MB HASH(ps_partkey,ps_suppkey)
| F03:EXCHANGE SENDER 1 1 1s929ms 1s929ms 5.12 KB 0
| 03:SCAN KUDU 1 1 9s991ms 9s991ms 24.00M 24.00M 1.39 MB 4.50 MB tpch30_kudu.partsupp
15:EXCHANGE 1 1 8.727us 8.727us 0 17.75M 0 10.37 MB HASH(l_partkey,l_suppkey)
F00:EXCHANGE SENDER 1 1 36.290us 36.290us 4.50 KB 0
07:HASH JOIN 1 1 131.115ms 131.115ms 0 17.75M 20.08 MB 4.75 MB INNER JOIN, BROADCAST
|--14:EXCHANGE 1 1 4.842ms 4.842ms 300.00K 300.00K 136.00 KB 2.90 MB BROADCAST
| F02:EXCHANGE SENDER 1 1 22.173ms 22.173ms 11.63 KB 0
| 01:SCAN KUDU 1 1 4s694ms 4s694ms 300.00K 300.00K 798.00 KB 3.00 MB tpch30_kudu.supplier
06:HASH JOIN 1 1 0.000ns 0.000ns 0 18.00M 34.08 MB 35.72 MB INNER JOIN, BROADCAST
|--13:EXCHANGE 1 1 7.778us 7.778us 0 600.00K 0 10.18 MB BROADCAST
| F01:EXCHANGE SENDER 1 1 42.328us 42.328us 5.12 KB 0
| 00:SCAN KUDU 1 1 14s003ms 14s003ms 0 600.00K 78.00 KB 3.00 MB tpch30_kudu.part
02:SCAN KUDU 1 1 18s729ms 18s729ms 19.99M 180.00M 2.28 MB 9.00 MB tpch30_kudu.lineitem
00:SCAN KUDU [tpch30_kudu.part]
| predicates: p_name LIKE '%:1%'
| runtime filters: RF004[bloom] -> tpch30_kudu.part.p_partkey, RF006[min_max] -> tpch30_kudu.part.p_partkey
01:SCAN KUDU [tpch30_kudu.supplier]
| runtime filters: RF000[bloom] -> s_nationkey, RF001[min_max] -> s_nationkey, RF005[bloom] -> tpch30_kudu.supplier.s_suppkey, RF007[min_max] -> tpch30_kudu.supplier.s_suppkey
02:SCAN KUDU [tpch30_kudu.lineitem]
runtime filters: RF010[bloom] -> l_partkey, RF011[min_max] -> l_partkey, RF008[bloom] -> l_suppkey, RF009[min_max] -> l_suppkey, RF002[bloom] -> l_orderkey, RF003[min_max] -> l_orderkey, RF004[bloom] -> l_partkey, RF005[bloom] -> l_suppkey, RF006[min_max] -> l_partkey, RF007[min_max] -> l_suppkey
Observation:
Five Bloom filters are applied to scan node - 02:SCAN KUDU which reads the biggest table with 179,998,372 rows.
The filtering introduces too much overhead for kudu-tserver to scan rows.
Improvement plan:
We need to add some heuristic logic in the planner when assigning bloom filters to Kudu scan node.
Compared with Parquet, Kudu doesn't have codegen optimization for filter evaluation.We need some optimization for Kudu in following tasks.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment