Skip to content

Instantly share code, notes, and snippets.

@rjurney
Last active December 18, 2015 04:09
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save rjurney/5723520 to your computer and use it in GitHub Desktop.
Save rjurney/5723520 to your computer and use it in GitHub Desktop.
Is consistent order maintained in line 9/10?
pairs = FOREACH pairs GENERATE elem1.follower AS follower,
elem1.repo AS repo1,
elem2.repo AS repo2,
elem1.rating AS rating1,
elem2.rating AS rating2;
by_repos = GROUP pairs BY (repo1, repo2);
gt_5 = FILTER by_repos BY COUNT_STAR(pairs) > 2;
pearson = FOREACH gt_5 GENERATE FLATTEN(group) AS (repo1, repo2),
udfs.cosine(pairs.rating1, pairs.rating2) as similarity;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment