Last active
April 17, 2016 22:12
-
-
Save mgarod/efb4bbbb687c377650b6a664bd30c616 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
input: | |
(c1, [p1]) | |
(c2, [p1, p2]) | |
(c3, [p1]) | |
----------------------------------------------------------------- | |
QUESTION 1: | |
Given a customer-product graph where customers and products are nodes, | |
and an edge is formed if a customer buys a product, design an MapReduce | |
algorithm to compute the frequency distribution of the number of customers | |
by the number of products they bought. The output should be sorted by the | |
number of customers in an ascending order. | |
expected output: | |
(1 product purchased, 2 customers) | |
(2 products purchased, 1 customer) | |
Mapper(customer, product_list): | |
product_count = 0 | |
for every p in product_list: | |
product_count += 1 | |
emit(product_count, 1) # k,v -> (num products bought, 1 customer puchased this num products) | |
Reducer(product_count, list of 1's represening num people who bought product_count): | |
num_people = 0 | |
for every one in list of 1's: | |
num_people += 1 | |
emit(product_count, num_people) # k,v -> (num products bought, num people who bought num products) | |
----------------------------------------------------------------- | |
QUESTION 2: | |
Given the same input data as the previous question, design a MapReduce | |
algorithm to count the number of times exactly two customers have | |
bought the same product. | |
PAIRS SOLUTION | |
PHASE 1: | |
Mapper1(customer, product_list): | |
For every p in product_list: | |
emit(p, customer) | |
Reducer1(product, customer_list): | |
For every c in customer_list: | |
For every other d in customer_list: # Assume that duplicates are not emitted, i.e. handled by "for every other" | |
emit((c,d), 1) | |
PHASE 2: | |
Reducer2(pair (c,d), occurrence_list): | |
num_occurrences = 0 | |
for i in occurrence_list: | |
num_occurrences += 1 | |
emit(pair (c,d), num_occurrences) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment