Skip to content

Instantly share code, notes, and snippets.

@codekitchen
Last active December 12, 2015 12:49
Show Gist options
  • Save codekitchen/4774601 to your computer and use it in GitHub Desktop.
Save codekitchen/4774601 to your computer and use it in GitHub Desktop.
example of querying cassandra page views with pig
export PIG_HOME=/usr/local/Cellar/pig/0.10.0
export JAVA_HOME=/System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home
export PIG_INITIAL_ADDRESS=localhost
export PIG_RPC_PORT=9160
export PIG_PARTITIONER=org.apache.cassandra.dht.RandomPartitioner
bin/pig_cassandra -x local
rows = LOAD 'cassandra://pageviews/page_views' USING CassandraStorage();
rows = filter rows by (account_id.value is not null);
-- for january 2013 -- unix epoch * 1000
for_month = filter rows by (created_at.value >= 1356998400000L and created_at.value < 1359676800000L);
grouped_by_root_account = group for_month by account_id.value;
distinct_user_count_by_account_id = foreach grouped_by_root_account { user_ids = for_month.user_id; di = distinct user_ids; generate group as account_id, COUNT(di) as users_count; }
dump distinct_user_count_by_account_id;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment