Skip to content

Instantly share code, notes, and snippets.

@mgarod
Last active April 17, 2016 22:12
Show Gist options
  • Save mgarod/efb4bbbb687c377650b6a664bd30c616 to your computer and use it in GitHub Desktop.
Save mgarod/efb4bbbb687c377650b6a664bd30c616 to your computer and use it in GitHub Desktop.
input:
(c1, [p1])
(c2, [p1, p2])
(c3, [p1])
-----------------------------------------------------------------
QUESTION 1:
Given a customer-product graph where customers and products are nodes,
and an edge is formed if a customer buys a product, design an MapReduce
algorithm to compute the frequency distribution of the number of customers
by the number of products they bought. The output should be sorted by the
number of customers in an ascending order.
expected output:
(1 product purchased, 2 customers)
(2 products purchased, 1 customer)
Mapper(customer, product_list):
product_count = 0
for every p in product_list:
product_count += 1
emit(product_count, 1) # k,v -> (num products bought, 1 customer puchased this num products)
Reducer(product_count, list of 1's represening num people who bought product_count):
num_people = 0
for every one in list of 1's:
num_people += 1
emit(product_count, num_people) # k,v -> (num products bought, num people who bought num products)
-----------------------------------------------------------------
QUESTION 2:
Given the same input data as the previous question, design a MapReduce
algorithm to count the number of times exactly two customers have
bought the same product.
PAIRS SOLUTION
PHASE 1:
Mapper1(customer, product_list):
For every p in product_list:
emit(p, customer)
Reducer1(product, customer_list):
For every c in customer_list:
For every other d in customer_list: # Assume that duplicates are not emitted, i.e. handled by "for every other"
emit((c,d), 1)
PHASE 2:
Reducer2(pair (c,d), occurrence_list):
num_occurrences = 0
for i in occurrence_list:
num_occurrences += 1
emit(pair (c,d), num_occurrences)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment