dariusf/cs2102-t05-1-4de.md

## cs2102-t05-1-4de.md

      
    Raw
  

              cs2102-t05-1-4de.md
            
          
    We're looking for pairs of customers who like exactly the same pizza, i.e. their respective sets of liked (and disliked) pizzas are equal.
To get this, we can find customers who differ in their preferences and take the complement of that. The general structure of the answer is thus:
all_pairs_of_customers = ...
customers_who_differ = ...
query = all_pairs_of_customers - customers_who_differ

We can get customers who differ from likesdislikes, but it's incomplete: assuming we know that <ca, cb> is a pair of customers who differ by virtue of being in likesdislikes, <cb, ca> must also be a pair of customers who differ, however they may not necessarily be in likesdislikes. Thus we have to ensure that we have all the customers who differ by adding the converse:
customers_who_differ = likesdislikes ∪ π(cb,ca)(likesdislikes)

Now to find all pairs of customers. We can get them from likes (for part d, as we have to exclude customers who don't like any pizza, i.e. are not in likes; for part e we would do exactly the same thing but with customers instead).
Say likes is {<1,2>,<3,4>}. When we try to get all pairs, we see some undesirable rows:
π(ca,cb)(ρ(ca,pa)(likes) × ρ(cb,pb)(likes)) = {<1,1>,<1,3>,<3,3>,<3,1>}

We only want the pair <1,3>: <1,1> and <3,3> trivially like the same pizzas, and <3,1> is redundant beacuse we already extended likesdislikes with its converse above. Thus we select with the condition ca < cb (as the question says; it's a stronger condition than != as it also removes <3,1>) to get the final set of all pairs:
all_pairs_of_customers = π(ca,cb)(σ(ca<cb)(ρ(ca,pa)(likes) × ρ(cb,pb)(likes)))


Let's also consider alternative answers we saw:
 pairsnomatch = "P([cname, cdname], S(cname < cdname /\\ dpizza!=pizza, likes * R(r(cdname, dpizza), likes)))"
 alllikepairs = "P([cname, cdname], S(cname < cdname, likes * R(r(cdname, dpizza), likes)))"
 query = f"R(s(ca, cb), {alllikepairs} - {pairsnomatch})"
This answer has the right idea, but it doesn't use likesdislikes and instead tries to get the pairs of customers with differing preferences from likes directly.
Unfortunately this is not sufficient; consider the case where likes is {<1,2>,<3,2>,<1,4>,<3,4>}, i.e. there are two customers who like more than one pizza in common. pairsnomatch will be {<1,3>}, which is not right as customers 1 and 3 do like exactly the same pizza.
otherlikes = "R(otherlikes(othername, otherpizza), likes)"
hasMatches = f"S(cname<othername /\\ pizza==otherpizza, likes*{otherlikes})"
diffMatches = f"S(cname<othername /\\ pizza!=otherpizza, likes*{otherlikes})"
diffMatches2 = f"R(secondmatch(cnametwo, pizzatwo, othernametwo, otherpizzatwo), {diffMatches})"
safe = f"S(cname==cnametwo /\\ othername==othernametwo /\\ pizza==otherpizzatwo /\\ pizzatwo==otherpizza, {diffMatches}*{diffMatches2})"
toFilterOut = f"P([cname, pizza, othername, otherpizza], {safe})"
query = f"P([cname, othername], {hasMatches}) - P([cname, othername], {diffMatches} - {toFilterOut}))"
This answer fixes the problem by detecting the false negatives in safe and removing them from the set of customers who differ.
Intuitively, if we have a pair of customers in diffMatches who like different pizza, but who also like pizza that the other likes, then their preferences do not differ, and we can consider that pair a false negative. A example of this with the input above is diffMatches being {<1,2,3,4>,<1,4,3,2>}, causing <1,3> to be in safe and not be considered as differing.