Skip to content

Instantly share code, notes, and snippets.

@gregrahn
Last active April 6, 2018 22:23
Show Gist options
  • Star 7 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
  • Save gregrahn/7492905 to your computer and use it in GitHub Desktop.
Save gregrahn/7492905 to your computer and use it in GitHub Desktop.
Three comparison points:
Presto + RCFile vs Impala + RCFile vs Impala + Parquet
Note: Query time, CPU utilization, Disk read tput (KBRead)
Impala v1.1.1
Presto v0.52
================================================================================================================================
Presto + RCFile:
select ss_sold_date_sk, count(*) from store_sales_rcfile group by 1 order by 1 limit 2000;
(1823 rows)
Query 20131115_012634_00021_48spk, FINISHED, 17 nodes
Splits: 46,568 total, 46,568 done (100.00%)
12:03 [82.5B rows, 3.15TB] [114M rows/s, 4.46GB/s]
# Thu Nov 14 17:27:31 2013 Connected: 18 of 18
# <----CPU[HYPER]-----><-----------Memory-----------><---------------Disks----------------><----------Network---------->
#Host cpu sys inter ctxsw Free Buff Cach Inac Slab Map KBRead Reads Size KBWrit Writes Size KBIn PktIn KBOut PktOut
e1119 32 2 38855 55882 38G 821M 18G 8G 969M 4G 0 0 0 34 3 11 34535 32738 1577 16204
e1120 94 5 50726 20937 1G 407M 19G 18G 624M 39G 241989 586 413 12516 35 363 63261 48073 38076 34206
e1121 94 6 47783 23880 2G 500M 45G 44G 763M 12G 296873 730 407 0 0 0 44762 35722 42821 35569
e1218 92 7 55000 49455 2G 660M 45G 43G 823M 13G 378370 910 416 18 1 18 41146 37062 76548 59469
e1220 72 9 69467 42586 1G 481M 46G 45G 676M 12G 465862 1137 410 0 0 0 141749 104381 58756 56018
e1221 93 5 54352 24366 1G 453M 18G 17G 646M 40G 352016 856 411 44 7 6 50795 42965 68562 54543
e1318 94 5 52611 23337 1G 209M 17G 15G 478M 43G 300208 762 394 0 0 0 52303 42776 53880 44532
e1319 91 6 54203 26212 6G 369M 23G 22G 629M 31G 307396 749 410 0 0 0 61142 48583 55828 46454
e1320 92 5 56439 24382 7G 388M 10G 9G 561M 43G 435526 1126 387 0 0 0 55120 45940 73760 58689
e1321 93 6 50951 24651 1G 511M 46G 44G 789M 13G 234416 573 409 22 3 7 62214 47589 37714 33726
e1418 92 6 48228 24402 2G 537M 46G 45G 764M 11G 274081 670 409 11062 29 388 56292 43317 36357 31961
e1419 93 4 52250 21671 1G 381M 11G 11G 545M 48G 336166 875 384 18 4 5 55246 44503 56282 46304
e1420 92 6 53926 25571 3G 477M 47G 46G 593M 11G 352996 871 405 14 2 9 46058 39889 68974 54693
e1421 95 6 53386 22705 1G 553M 47G 45G 753M 11G 324422 789 411 44 3 18 52102 42855 62181 50321
e1518 94 4 47005 21438 3G 473M 44G 42G 818M 13G 188424 470 401 0 0 0 40947 32530 32479 28292
e1519 89 6 53738 25034 3G 405M 20G 19G 588M 37G 377484 957 394 60 3 20 52640 44009 69282 55370
e1520 90 6 53737 27136 3G 375M 15G 14G 537M 42G 352216 881 400 44 6 7 52850 44307 60924 49042
e1521 95 4 45826 19685 1G 422M 48G 44G 673M 11G 277117 685 405 0 0 0 23583 22500 55505 42823
Impala + RCFile:
select ss_sold_date_sk, count(*) from store_sales_rcfile group by 1 order by 1 limit 2000;
Returned 1823 row(s) in 291.77s (4m51s)
# Thu Nov 14 17:44:59 2013 Connected: 18 of 18
# <----CPU[HYPER]-----><-----------Memory-----------><---------------Disks----------------><----------Network---------->
#Host cpu sys inter ctxsw Free Buff Cach Inac Slab Map KBRead Reads Size KBWrit Writes Size KBIn PktIn KBOut PktOut
e1119 4 0 2648 3091 38G 821M 18G 8G 966M 4G 0 0 0 22 2 15 43 247 30 262
e1120 19 5 19458 34122 1G 407M 20G 19G 464M 39G 829104 6359 130 10 2 7 7767 6397 9275 7690
e1121 22 6 20654 35761 1G 500M 47G 46G 692M 12G 872880 6666 131 36 3 12 14285 11147 11972 9926
e1218 21 5 18185 107647 2G 661M 45G 45G 790M 12G 640520 4977 129 26 3 9 11974 9168 8590 7309
e1220 20 5 20148 37302 1G 482M 46G 45G 714M 12G 945870 7279 130 30 3 10 11690 9296 12457 10057
e1221 23 5 19791 34838 1G 452M 19G 18G 488M 40G 893614 6809 131 208 4 52 11915 9208 10080 8404
e1318 18 5 20047 34748 1G 211M 17G 16G 341M 43G 929290 7105 131 54 5 11 16171 11982 7212 6536
e1319 19 4 17194 32976 1G 369M 28G 27G 488M 31G 831246 6277 132 36 8 5 6266 5615 12742 9771
e1320 18 5 20436 35766 1G 389M 16G 16G 416M 43G 967068 7436 130 22 3 7 10265 7745 5752 5133
e1321 23 5 20826 37467 1G 512M 46G 45G 740M 13G 935874 7192 130 52698 108 488 16838 12503 8856 7908
e1418 25 7 24374 43374 2G 537M 47G 46G 724M 11G 935296 7163 131 58902 120 493 13805 10931 13938 11351
e1419 17 4 18861 32608 1G 382M 11G 11G 414M 48G 871546 6643 131 8 1 8 8309 6530 7337 6157
e1420 22 5 21949 37652 1G 478M 48G 48G 701M 10G 905420 6959 130 30 6 5 14476 11883 20066 15941
e1421 22 5 20495 36187 1G 554M 47G 46G 780M 11G 925946 7088 131 5132 12 446 11587 9163 10850 8976
e1518 16 4 18587 34320 3G 473M 44G 44G 757M 13G 843726 6435 131 10 2 7 12068 9806 15266 11844
e1519 17 4 19112 33592 1G 404M 22G 22G 453M 37G 826408 6202 133 10 2 7 9314 7841 16526 12691
e1520 18 5 20092 33409 1G 376M 17G 17G 432M 42G 832194 6385 130 8 1 8 11271 8997 10985 9248
e1521 33 5 25304 37162 1G 423M 48G 46G 519M 11G 914548 6986 131 52 8 7 12264 10467 17335 13605
Impala + Parquet
select ss_sold_date_sk, count(*) from store_sales_parquet group by 1 order by 1 limit 2000;
Returned 1823 row(s) in 55.06s
# Fri Nov 15 13:08:04 2013 Connected: 18 of 18
# <----CPU[HYPER]-----><-----------Memory-----------><---------------Disks----------------><----------Network---------->
#Host cpu sys inter ctxsw Free Buff Cach Inac Slab Map KBRead Reads Size KBWrit Writes Size KBIn PktIn KBOut PktOut
e1119 0 0 1283 2604 38G 821M 18G 7G 962M 4G 0 0 0 0 0 0 34 209 22 234
e1120 19 2 6561 136962 1G 186M 32G 19G 591M 27G 3072 6 512 72 5 16 52 105 13 80
e1121 13 1 4745 131726 1G 243M 32G 19G 457M 27G 4160 15 287 0 0 0 24 43 11 55
e1218 15 1 5314 132725 2G 207M 31G 20G 429M 27G 4160 15 287 10 1 10 28 42 10 53
e1220 14 1 5047 141274 1G 280M 32G 19G 469M 28G 4096 14 293 12 2 8 24 42 10 48
e1221 14 1 5014 131201 1G 200M 32G 20G 445M 27G 4104 14 283 2 0 4 22 41 11 49
e1318 15 1 5332 132414 1G 187M 32G 17G 436M 27G 4032 13 310 0 0 0 28 48 16 63
e1319 20 2 6632 127630 1G 280M 31G 16G 601M 28G 3136 7 482 54 2 27 25 44 11 54
e1320 14 1 4951 131399 1G 185M 31G 21G 439M 28G 4032 14 299 22 2 15 24 43 12 55
e1321 18 2 5809 141283 1G 153M 32G 20G 563M 28G 3072 6 512 6 1 6 21 38 11 50
e1418 19 2 6257 123300 1G 53M 23G 11G 539M 36G 3082 7 474 6 1 6 28 48 15 62
e1419 17 1 6177 133326 1G 153M 32G 20G 453M 27G 4132 23 180 224 4 56 26 43 10 51
e1420 14 1 5016 139672 1G 209M 30G 19G 433M 29G 4032 14 299 0 0 0 21 41 13 53
e1421 19 2 6044 141922 1G 181M 32G 19G 569M 27G 3072 6 512 6 1 6 22 41 11 47
e1518 14 1 5063 137915 1G 239M 31G 18G 439M 29G 4224 15 282 30 6 5 20 39 11 50
e1519 15 1 5184 137180 1G 224M 32G 20G 440M 28G 4096 14 293 10 1 10 21 39 10 46
e1520 14 1 4940 132519 1G 300M 33G 19G 452M 26G 4096 14 293 0 0 0 21 40 10 48
e1521 15 1 5157 126594 1G 285M 29G 16G 439M 30G 4102 15 283 10 1 10 26 42 10 49
@gregrahn
Copy link
Author

The host metrics are a capture from collectl + colmux.
http://collectl.sourceforge.net/
http://collectl-utils.sourceforge.net/colmux.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment