Skip to content

Instantly share code, notes, and snippets.

View gregrahn's full-sized avatar

Greg Rahn gregrahn

View GitHub Profile
@gregrahn
gregrahn / values.sql
Created June 20, 2013 21:20
Example of PostgreSQL VALUES() functionality in Cloudera Impala. Examples: https://issues.cloudera.org/browse/IMPALA-68 http://www.postgresql.org/docs/9.2/static/sql-values.html
[impala1:21000] >
select *
from (values ('2013-06-01' as col1),
('2013-06-02'),
('2013-06-02'),
('2013-06-03'),
('2013-06-04'),
('2013-06-05')
) x;
@gregrahn
gregrahn / bart-2012-salaries.tsv
Last active December 19, 2015 06:59
2012 Bart Salaries. Extracted from Public Employee Salaries Database. http://www.mercurynews.com/salaries/bay-area?Entity=Bay%20Area%20Rapid%20Transit
We can make this file beautiful and searchable if this error is corrected: It looks like row 2 should actually have 11 columns, instead of 12. in line 1.
Entity Name Title Base Overtime Other (vacation, sick, bonus, etc) Medical/Dental/Visual Employeer Contribution to Pension Employee Contribution to Pension Paid By Employer Employer Contribution to Deferred Compensation (401k) Misc Total Cost of Employment
Bay Area Rapid Transit Dugger, Dorothy General Mgr 298700 0 34500 14951 39521 23324 1869 6796 419661
Bay Area Rapid Transit Crunican, Grace General Mgr 312461 0 3846 19141 37513 17500 1869 7591 399921
Bay Area Rapid Transit Tietz, Forrest Police Sergeant 136746 111902 33921 18200 60630 156 0 1107 362662
Bay Area Rapid Transit Pangilinan, Edgardo Asst Controller 107785 0 214322 10903 13017 7650 1869 5734 361279
Bay Area Rapid Transit Lucarelli, Frank Police Lieutenant 173811 46280 33422 23364 76427 233 0 5019 358556
Bay Area Rapid Transit Collier, Roberta Asst Treasurer 33971 0 289534 1797 4072 2378 1869 4897 338518
Bay Area Rapid Transit Parker, Thomas Exec Mgr Transit System Compl 136544 0 145633 19139 16863 9923 1869 5584 335554
Bay Area Rapid Transit Ra
@gregrahn
gregrahn / tweet2csv.py
Created August 15, 2013 05:20
translate tweet json to a CSV file
#!/usr/bin/env python
# encoding: utf-8
import sys
import urllib
import codecs
import json
import unicodecsv
import dateutil.parser as parser
/* Instructions on compilation and execution
* =========================================
*
* Compile this program with pthreads:
*
* g++ -Wall -lpthread -o graphdb-simulator graphdb-simulator.cpp
*
* Before you run this program, you need to create the following
* directories:
*
Three comparison points:
Presto + RCFile vs Impala + RCFile vs Impala + Parquet
Note: Query time, CPU utilization, Disk read tput (KBRead)
Impala v1.1.1
Presto v0.52
================================================================================================================================
Presto + RCFile:
select ss_sold_date_sk, count(*) from store_sales_rcfile group by 1 order by 1 limit 2000;
--------------
alter table call_center add constraint cc_d1 foreign key (cc_closed_date_sk) references date_dim (d_date_sk)
--------------
Query OK, 6 rows affected (0.03 sec)
Records: 6 Duplicates: 0 Warnings: 0
--------------
alter table call_center add constraint cc_d2 foreign key (cc_open_date_sk) references date_dim (d_date_sk)
--------------
create unique index person_pk on person(person_id);
create index person_n1 on person(date_id);
create unique index calendar_pk on calendar(date_id) ;
@gregrahn
gregrahn / tez-crash.sql
Created July 18, 2014 22:06
tez-crash.sql
hive> set hive.execution.engine=tez;
hive> select split('I crash Tez',' ');
Query ID = hue_20140718150505_1eafc45c-6e2b-49f1-9d72-0ec737a72377
Total jobs = 1
Launching Job 1 out of 1
Status: Running (application id: application_1405217363240_0016)
Map 1: -/-
@gregrahn
gregrahn / inmap.sh
Last active August 29, 2015 14:05
download and merge all the tiles to make a high resolution LinkedIn InMap
# 1) get your inmap from http://inmaps.linkedinlabs.com/
# 2) zoom all the way in and note the url for the individual tiles (I used Charles Proxy to do this,
# but right/control-click, "Inspect Element", "Network" tab, then reload page, also works.)
# 3) download your tiles
# 4) merge the tiles into column-wise strips
# 5) merge the column-wise strips into the final high resolution image
# download the individual tiles
# my inmap happended to be 15 tiles wide and 12 tiles tall

Keybase proof

I hereby claim:

  • I am gregrahn on github.
  • I am gregrahn (https://keybase.io/gregrahn) on keybase.
  • I have a public key whose fingerprint is 9C32 D323 4E55 8113 FE4B CFEB FA4D 0C79 A267 C6C4

To claim this, I am signing this object: