Create a gist now

Instantly share code, notes, and snippets.

What would you like to do?
Is consistent order maintained in line 9/10?
pairs = FOREACH pairs GENERATE elem1.follower AS follower,
elem1.repo AS repo1,
elem2.repo AS repo2,
elem1.rating AS rating1,
elem2.rating AS rating2;
by_repos = GROUP pairs BY (repo1, repo2);
gt_5 = FILTER by_repos BY COUNT_STAR(pairs) > 2;
pearson = FOREACH gt_5 GENERATE FLATTEN(group) AS (repo1, repo2),
udfs.cosine(pairs.rating1, pairs.rating2) as similarity;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment