Queries are run against a Parquet TPCH SF 1 schema on Presto 0.286 on a single node cluster
Q1
presto:tpch_sf1_parquet> explain select l_quantity from lineitem l, orders o where l_orderkey = o_orderkey and o_totalprice < 1000 and l_quantity < 48;
- Output[PlanNodeId 8][l_quantity] => [l_quantity:decimal(15,2)]
Estimates: {source: CostBasedSourceInfo, rows: 1,432 (12.59kB), cpu: 290,640,426.81, memory: 1,924.82, network: 14,812.88}
- RemoteStreamingExchange[PlanNodeId 414][GATHER] => [l_quantity:decimal(15,2)]
Estimates: {source: CostBasedSourceInfo, rows: 1,432 (12.59kB), cpu: 290,640,426.81, memory: 1,924.82, network: 14,812.88}
- InnerJoin[PlanNodeId 372][("l_orderkey" = "o_orderkey")] => [l_quantity:decimal(15,2)]
Estimates: {source: CostBasedSourceInfo, rows: 1,432 (12.59kB), cpu: 290,640,426.81, memory: 1,924.82, network: 1,924.82}
Distribution: REPLICATED
- ScanFilter[PlanNodeId 0,411][table = TableHandle {connectorId='local_hms', connectorHandle='HiveTableHandle{schemaName=tpch_sf1_parquet, tableName=lineitem, analyzePartitionValues=Optional.empty}', layout='Optional[tpch_sf1_parquet.lineitem{domains={l_quantity=[ [(<min>, "48.00")] ]}}]'}, filterPredicate = (l_quantity) < (DECIMAL'48.00')] => [l_orderkey:integer, l_quantity:decimal(15,2)]
Estimates: {source: CostBasedSourceInfo, rows: 6,001,215 (51.51MB), cpu: 84,017,010.00, memory: 0.00, network: 0.00}/{source: CostBasedSourceInfo, rows: 5,756,267 (49.41MB), cpu: 168,034,020.00, memory: 0.00, network: 0.00}
LAYOUT: tpch_sf1_parquet.lineitem{domains={l_quantity=[ [(<min>, "48.00")] ]}}
l_orderkey := l_orderkey:int:0:REGULAR (1:32)
l_quantity := l_quantity:decimal(15,2):4:REGULAR (1:32)
- LocalExchange[PlanNodeId 457][HASH] (o_orderkey) => [o_orderkey:integer]
Estimates: {source: CostBasedSourceInfo, rows: 385 (3.38kB), cpu: 42,003,849.65, memory: 0.00, network: 1,924.82}
- RemoteStreamingExchange[PlanNodeId 413][REPLICATE] => [o_orderkey:integer]
Estimates: {source: CostBasedSourceInfo, rows: 385 (3.38kB), cpu: 42,001,924.82, memory: 0.00, network: 1,924.82}
- ScanFilterProject[PlanNodeId 1,412,179][table = TableHandle {connectorId='local_hms', connectorHandle='HiveTableHandle{schemaName=tpch_sf1_parquet, tableName=orders, analyzePartitionValues=Optional.empty}', layout='Optional[tpch_sf1_parquet.orders{domains={o_totalprice=[ [(<min>, "1000.00")] ]}}]'}, filterPredicate = (o_totalprice) < (DECIMAL'1000.00'), projectLocality = LOCAL] => [o_orderkey:integer]
Estimates: {source: CostBasedSourceInfo, rows: 1,500,000 (12.87MB), cpu: 21,000,000.00, memory: 0.00, network: 0.00}/{source: CostBasedSourceInfo, rows: 385 (3.38kB), cpu: 42,000,000.00, memory: 0.00, network: 0.00}/{source: CostBasedSourceInfo, rows: 385 (3.38kB), cpu: 42,001,924.82, memory: 0.00, network: 0.00}
LAYOUT: tpch_sf1_parquet.orders{domains={o_totalprice=[ [(<min>, "1000.00")] ]}}
o_totalprice := o_totalprice:decimal(15,2):3:REGULAR (1:44)
o_orderkey := o_orderkey:int:0:REGULAR (1:44)
Q2
presto:tpch_sf1_parquet> explain select l_quantity from lineitem l, orders o where l_orderkey = o_orderkey and o_totalprice < 1000 and l_quantity + cast(floor(random()) as decimal (15,2)) < 48;
Query Plan >
- Output[PlanNodeId 8][l_quantity] => [l_quantity:decimal(15,2)]
Estimates: {source: CostBasedSourceInfo, rows: 1,344 (11.81kB), cpu: 210,051,887.31, memory: 1,924.82, network: 14,017.66}
- RemoteStreamingExchange[PlanNodeId 370][GATHER] => [l_quantity:decimal(15,2)]
Estimates: {source: CostBasedSourceInfo, rows: 1,344 (11.81kB), cpu: 210,051,887.31, memory: 1,924.82, network: 14,017.66}
- InnerJoin[PlanNodeId 55][("l_orderkey" = "o_orderkey") AND ((l_quantity) + (CAST(floor(random()) AS decimal(15,2)))) < (DECIMAL'48.00')] => [l_quantity:decimal(15,2)]
Estimates: {source: CostBasedSourceInfo, rows: 1,344 (11.81kB), cpu: 210,051,887.31, memory: 1,924.82, network: 1,924.82}
Distribution: REPLICATED
- TableScan[PlanNodeId 0][TableHandle {connectorId='local_hms', connectorHandle='HiveTableHandle{schemaName=tpch_sf1_parquet, tableName=lineitem, analyzePartitionValues=Optional.empty}', layout='Optional[tpch_sf1_parquet.lineitem{}]'}] => [l_orderkey:integer, l_quantity:decimal(15,2)]
Estimates: {source: CostBasedSourceInfo, rows: 6,001,215 (51.51MB), cpu: 84,017,010.00, memory: 0.00, network: 0.00}
LAYOUT: tpch_sf1_parquet.lineitem{}
l_orderkey := l_orderkey:int:0:REGULAR (1:32)
l_quantity := l_quantity:decimal(15,2):4:REGULAR (1:32)
- LocalExchange[PlanNodeId 407][HASH] (o_orderkey) => [o_orderkey:integer]
Estimates: {source: CostBasedSourceInfo, rows: 385 (3.38kB), cpu: 42,003,849.65, memory: 0.00, network: 1,924.82}
- RemoteStreamingExchange[PlanNodeId 369][REPLICATE] => [o_orderkey:integer]
Estimates: {source: CostBasedSourceInfo, rows: 385 (3.38kB), cpu: 42,001,924.82, memory: 0.00, network: 1,924.82}
- ScanFilterProject[PlanNodeId 1,368,178][table = TableHandle {connectorId='local_hms', connectorHandle='HiveTableHandle{schemaName=tpch_sf1_parquet, tableName=orders, analyzePartitionValues=Optional.empty}', layout='Optional[tpch_sf1_parquet.orders{domains={o_totalprice=[ [(<min>, "1000.00")] ]}}]'}, filterPredicate = (o_totalprice) < (DECIMAL'1000.00'), projectLocality = LOCAL] => [o_orderkey:integer]
Estimates: {source: CostBasedSourceInfo, rows: 1,500,000 (12.87MB), cpu: 21,000,000.00, memory: 0.00, network: 0.00}/{source: CostBasedSourceInfo, rows: 385 (3.38kB), cpu: 42,000,000.00, memory: 0.00, network: 0.00}/{source: CostBasedSourceInfo, rows: 385 (3.38kB), cpu: 42,001,924.82, memory: 0.00, network: 0.00}
LAYOUT: tpch_sf1_parquet.orders{domains={o_totalprice=[ [(<min>, "1000.00")] ]}}
o_totalprice := o_totalprice:decimal(15,2):3:REGULAR (1:44)
o_orderkey := o_orderkey:int:0:REGULAR (1:44)
Q1 and Q2 were run three times each and the last run was recorded. Highlights
- Lineitem scan on Q2 took CPU time of
597 ms
vs1.03s
in Q1 - Overall CPU usage was
871ms
in Q2 vs1.35s
in Q1
Q1 -
presto:tpch_sf1_parquet> explain analyze select l_quantity from lineitem l, orders o where l_orderkey = o_orderkey and o_totalprice < 1000 and l_quantity < 48;
Query Plan >
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------->
Fragment 1 [SOURCE] >
CPU: 1.35s, Scheduled: 1.41s, Input: 6,001,375 rows (20.68MB); per task: avg.: 6,001,375.00 std.dev.: 0.00, Output: 160 rows (2.03kB), 1 tasks >
Output layout: [l_quantity] >
Output partitioning: SINGLE [] >
Stage Execution Strategy: UNGROUPED_EXECUTION >
- InnerJoin[PlanNodeId 426][("l_orderkey" = "o_orderkey")] => [l_quantity:decimal(15,2)] >
Estimates: {source: CostBasedSourceInfo, rows: 1,432 (12.59kB), cpu: 290,640,426.81, memory: 1,924.82, network: 1,924.82} >
CPU: 323.00ms (22.02%), Scheduled: 343.00ms (21.68%), Output: 160 rows (2.03kB) >
Left (probe) Input avg.: 626,839.33 rows, Input std.dev.: 25.37% >
Right (build) Input avg.: 10.00 rows, Input std.dev.: 35.18% >
Collisions avg.: 2.83 (43.53% est.), Collisions std.dev.: 67.74% >
Distribution: REPLICATED >
- ScanFilter[PlanNodeId 0,471][table = TableHandle {connectorId='local_hms', connectorHandle='HiveTableHandle{schemaName=tpch_sf1_parquet, tableName=lineitem, analyzePartitionValues=Optional.empty}', layout='Optional[tpch_sf1_parquet.>
Estimates: {source: CostBasedSourceInfo, rows: 6,001,215 (51.51MB), cpu: 84,017,010.00, memory: 0.00, network: 0.00}/{source: CostBasedSourceInfo, rows: 5,756,267 (49.41MB), cpu: 168,034,020.00, memory: 0.00, network: 0.00} >
CPU: 1.03s (70.21%), Scheduled: 1.10s (69.79%), Output: 5,641,554 rows (118.33MB) >
Input avg.: 666,801.67 rows, Input std.dev.: 25.36% >
LAYOUT: tpch_sf1_parquet.lineitem{domains={l_quantity=[ [(<min>, "48.00")] ]}} >
l_orderkey := l_orderkey:int:0:REGULAR (1:40) >
l_quantity := l_quantity:decimal(15,2):4:REGULAR (1:40) >
Input: 6,001,215 rows (20.68MB), Filtered: 5.99% >
- LocalExchange[PlanNodeId 522][HASH] (o_orderkey) => [o_orderkey:integer] >
Estimates: {source: CostBasedSourceInfo, rows: 385 (3.38kB), cpu: 42,003,849.65, memory: 0.00, network: 1,924.82} >
CPU: 0.00ns (0.00%), Scheduled: 0.00ns (0.00%), Output: 160 rows (800B) >
Input avg.: 10.00 rows, Input std.dev.: 266.08% >
- RemoteSource[2] => [o_orderkey:integer] >
CPU: 0.00ns (0.00%), Scheduled: 0.00ns (0.00%), Output: 160 rows (800B) >
Input avg.: 10.00 rows, Input std.dev.: 266.08% >
>
Fragment 2 [SOURCE] >
CPU: 114.25ms, Scheduled: 127.63ms, Input: 1,500,000 rows (13.77MB); per task: avg.: 1,500,000.00 std.dev.: 0.00, Output: 160 rows (800B), 1 tasks >
Output layout: [o_orderkey] >
Output partitioning: BROADCAST [] >
Stage Execution Strategy: UNGROUPED_EXECUTION >
- ScanFilterProject[PlanNodeId 1,472,205][table = TableHandle {connectorId='local_hms', connectorHandle='HiveTableHandle{schemaName=tpch_sf1_parquet, tableName=orders, analyzePartitionValues=Optional.empty}', layout='Optional[tpch_sf1_par>
Estimates: {source: CostBasedSourceInfo, rows: 1,500,000 (7.15MB), cpu: 21,000,000.00, memory: 0.00, network: 0.00}/{source: CostBasedSourceInfo, rows: 385 (1.88kB), cpu: 42,000,000.00, memory: 0.00, network: 0.00}/{source: CostBa>
CPU: 114.00ms (7.77%), Scheduled: 135.00ms (8.53%), Output: 160 rows (800B) >
Input avg.: 750,000.00 rows, Input std.dev.: 14.69% >
LAYOUT: tpch_sf1_parquet.orders{domains={o_totalprice=[ [(<min>, "1000.00")] ]}} >
o_totalprice := o_totalprice:decimal(15,2):3:REGULAR (1:52) >
o_orderkey := o_orderkey:int:0:REGULAR (1:52) >
Input: 1,500,000 rows (13.77MB), Filtered: 99.99% >
>
>
(1 row)
Query 20240220_071124_00084_dbatx, FINISHED, 1 node
Splits: 60 total, 60 done (100.00%)
[Latency: client-side: 479ms, server-side: 474ms] [7.5M rows, 34.4MB] [15.8M rows/s, 72.7MB/s]
Q2 -
presto:tpch_sf1_parquet> explain analyze select l_quantity from lineitem l, orders o where l_orderkey = o_orderkey and o_totalprice < 1000 and l_quantity + cast(floor(random()) as decimal (15,2)) < 48;
Query Plan >
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------->
Fragment 1 [SOURCE] >
CPU: 871.93ms, Scheduled: 910.65ms, Input: 6,001,375 rows (20.68MB); per task: avg.: 6,001,375.00 std.dev.: 0.00, Output: 160 rows (1.44kB), 1 tasks >
Output layout: [l_quantity] >
Output partitioning: SINGLE [] >
Stage Execution Strategy: UNGROUPED_EXECUTION >
- InnerJoin[PlanNodeId 61][("l_orderkey" = "o_orderkey") AND ((l_quantity) + (CAST(floor(random()) AS decimal(15,2)))) < (DECIMAL'48.00')] => [l_quantity:decimal(15,2)] >
Estimates: {source: CostBasedSourceInfo, rows: 1,344 (11.81kB), cpu: 210,051,887.31, memory: 1,924.82, network: 1,924.82} >
CPU: 272.00ms (27.70%), Scheduled: 289.00ms (27.01%), Output: 160 rows (1.44kB) >
Left (probe) Input avg.: 666,801.67 rows, Input std.dev.: 25.36% >
Right (build) Input avg.: 10.00 rows, Input std.dev.: 35.18% >
Collisions avg.: 2.83 (43.53% est.), Collisions std.dev.: 67.74% >
Distribution: REPLICATED >
- TableScan[PlanNodeId 0][TableHandle {connectorId='local_hms', connectorHandle='HiveTableHandle{schemaName=tpch_sf1_parquet, tableName=lineitem, analyzePartitionValues=Optional.empty}', layout='Optional[tpch_sf1_parquet.lineitem{}]'}>
Estimates: {source: CostBasedSourceInfo, rows: 6,001,215 (51.51MB), cpu: 84,017,010.00, memory: 0.00, network: 0.00} >
CPU: 597.00ms (60.79%), Scheduled: 654.00ms (61.12%), Output: 6,001,215 rows (80.12MB) >
Input avg.: 666,801.67 rows, Input std.dev.: 25.36% >
LAYOUT: tpch_sf1_parquet.lineitem{} >
l_orderkey := l_orderkey:int:0:REGULAR (1:40) >
l_quantity := l_quantity:decimal(15,2):4:REGULAR (1:40) >
Input: 6,001,215 rows (20.68MB), Filtered: 0.00% >
- LocalExchange[PlanNodeId 472][HASH] (o_orderkey) => [o_orderkey:integer] >
Estimates: {source: CostBasedSourceInfo, rows: 385 (3.38kB), cpu: 42,003,849.65, memory: 0.00, network: 1,924.82} >
CPU: 0.00ns (0.00%), Scheduled: 0.00ns (0.00%), Output: 160 rows (800B) >
Input avg.: 10.00 rows, Input std.dev.: 266.08% >
- RemoteSource[2] => [o_orderkey:integer] >
CPU: 0.00ns (0.00%), Scheduled: 0.00ns (0.00%), Output: 160 rows (800B) >
Input avg.: 10.00 rows, Input std.dev.: 266.08% >
>
Fragment 2 [SOURCE] >
CPU: 113.40ms, Scheduled: 120.26ms, Input: 1,500,000 rows (13.77MB); per task: avg.: 1,500,000.00 std.dev.: 0.00, Output: 160 rows (800B), 1 tasks >
Output layout: [o_orderkey] >
Output partitioning: BROADCAST [] >
Stage Execution Strategy: UNGROUPED_EXECUTION >
- ScanFilterProject[PlanNodeId 1,428,204][table = TableHandle {connectorId='local_hms', connectorHandle='HiveTableHandle{schemaName=tpch_sf1_parquet, tableName=orders, analyzePartitionValues=Optional.empty}', layout='Optional[tpch_sf1_par>
Estimates: {source: CostBasedSourceInfo, rows: 1,500,000 (7.15MB), cpu: 21,000,000.00, memory: 0.00, network: 0.00}/{source: CostBasedSourceInfo, rows: 385 (1.88kB), cpu: 42,000,000.00, memory: 0.00, network: 0.00}/{source: CostBa>
CPU: 113.00ms (11.51%), Scheduled: 127.00ms (11.87%), Output: 160 rows (800B) >
Input avg.: 750,000.00 rows, Input std.dev.: 14.69% >
LAYOUT: tpch_sf1_parquet.orders{domains={o_totalprice=[ [(<min>, "1000.00")] ]}} >
o_totalprice := o_totalprice:decimal(15,2):3:REGULAR (1:52) >
o_orderkey := o_orderkey:int:0:REGULAR (1:52) >
Input: 1,500,000 rows (13.77MB), Filtered: 99.99% >
>
>
(1 row)
Query 20240220_071204_00088_dbatx, FINISHED, 1 node
Splits: 60 total, 60 done (100.00%)
[Latency: client-side: 409ms, server-side: 405ms] [7.5M rows, 34.4MB] [18.5M rows/s, 85.1MB/s]