Skip to content

Instantly share code, notes, and snippets.

@brikis98
Created November 2, 2011 03:57
Show Gist options
  • Save brikis98/1332818 to your computer and use it in GitHub Desktop.
Save brikis98/1332818 to your computer and use it in GitHub Desktop.
Apache Pig Script Example
pv_by_industry = GROUP profile_view by viewee_industry_id
pv_avg_by_industry = FOREACH pv_by_industry
GENERATE group as viewee_industry_id, AVG(profie_view) AS average_pv;
pv_group_by_viewee = GROUP profile_view by viewee_member_id
pv_with_count = FOREACH pv_group_by_viewee {
GENERATE viewee_member_id, COUNT_STAR(profile_view) as pv_count;
}
connections_pv_source = JOIN pv_with_count BY viewee_member_id, member_connections BY source_member_id;
connections_pv_dest = JOIN pv_with_count BY viewee_member_id,
connections_pv_source BY dest_member_id;
connections_with_higher_vp = FILTER connections_pv_dest BY connections_pv_dest::pv_count > connections_pv_source::pv_count;
urls = load 'dataset' AS (url, category, pagerank);
groups = GROUP urls by category;
bigGroup = FILTER groups by COUNT(groups) > 100000;
STORE result bigGroup INTO 'bigGroupOutput';
few_pv_email_data = JOIN connections_with_higher_pv by viewee_industry_id, pv_avg_by_industry by viewee_industry_id;
# Member_Connections Table
+---------------------------------+------------------+
| Field | Type |
+---------------------------------+------------------+
| source_member_id | int |
| dest_member_id | int |
+---------------------------------+------------------+
# Profile_View Table
+---------------------------------+------------------+
| Field | Type |
+---------------------------------+------------------+
| viewee_member_id | int |
| viewer_member_id | int |
| viewee_industry_id | int |
| tracking_time | timestamp |
+---------------------------------+------------------+
@brikis98
Copy link
Author

brikis98 commented Nov 2, 2011

@dajobe
Copy link

dajobe commented Nov 3, 2011

INOT and FITLER presumably are typos?

@brikis98
Copy link
Author

brikis98 commented Nov 3, 2011

@dajobe: Those were indeed typos. Fixed now. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment