We're looking for pairs of customers who like exactly the same pizza, i.e. their respective sets of liked (and disliked) pizzas are equal.
To get this, we can find customers who differ in their preferences and take the complement of that. The general structure of the answer is thus:
all_pairs_of_customers = ...
customers_who_differ = ...
query = all_pairs_of_customers - customers_who_differ
We can get customers who differ from likesdislikes
, but it's incomplete: assuming we know that <ca, cb>
is a pair of customers who differ by virtue of being in likesdislikes
, <cb, ca>
must also be a pair of customers who differ, however they may not necessarily be in likesdislikes
. Thus we have to ensure that we have all the customers who differ by adding the converse:
customers_who_differ = likesdislikes ∪ π(cb,ca)(likesdislikes)
Now to find all pairs of customers. We can get them from likes
(for part d, as we have to exclude customers who don't like any pizza, i.e. are not in likes
; for part e we would do exactly the same thing but with customers
instead).
Say likes
is {<1,2>,<3,4>}
. When we try to get all pairs, we see some undesirable rows:
π(ca,cb)(ρ(ca,pa)(likes) × ρ(cb,pb)(likes)) = {<1,1>,<1,3>,<3,3>,<3,1>}
We only want the pair <1,3>
: <1,1>
and <3,3>
trivially like the same pizzas, and <3,1>
is redundant beacuse we already extended likesdislikes
with its converse above. Thus we select with the condition ca < cb (as the question says; it's a stronger condition than != as it also removes <3,1>
) to get the final set of all pairs:
all_pairs_of_customers = π(ca,cb)(σ(ca<cb)(ρ(ca,pa)(likes) × ρ(cb,pb)(likes)))
Let's also consider alternative answers we saw:
pairsnomatch = "P([cname, cdname], S(cname < cdname /\\ dpizza!=pizza, likes * R(r(cdname, dpizza), likes)))"
alllikepairs = "P([cname, cdname], S(cname < cdname, likes * R(r(cdname, dpizza), likes)))"
query = f"R(s(ca, cb), {alllikepairs} - {pairsnomatch})"
This answer has the right idea, but it doesn't use likesdislikes
and instead tries to get the pairs of customers with differing preferences from likes
directly.
Unfortunately this is not sufficient; consider the case where likes
is {<1,2>,<3,2>,<1,4>,<3,4>}
, i.e. there are two customers who like more than one pizza in common. pairsnomatch
will be {<1,3>}
, which is not right as customers 1 and 3 do like exactly the same pizza.
otherlikes = "R(otherlikes(othername, otherpizza), likes)"
hasMatches = f"S(cname<othername /\\ pizza==otherpizza, likes*{otherlikes})"
diffMatches = f"S(cname<othername /\\ pizza!=otherpizza, likes*{otherlikes})"
diffMatches2 = f"R(secondmatch(cnametwo, pizzatwo, othernametwo, otherpizzatwo), {diffMatches})"
safe = f"S(cname==cnametwo /\\ othername==othernametwo /\\ pizza==otherpizzatwo /\\ pizzatwo==otherpizza, {diffMatches}*{diffMatches2})"
toFilterOut = f"P([cname, pizza, othername, otherpizza], {safe})"
query = f"P([cname, othername], {hasMatches}) - P([cname, othername], {diffMatches} - {toFilterOut}))"
This answer fixes the problem by detecting the false negatives in safe
and removing them from the set of customers who differ.
Intuitively, if we have a pair of customers in diffMatches
who like different pizza, but who also like pizza that the other likes, then their preferences do not differ, and we can consider that pair a false negative. A example of this with the input above is diffMatches
being {<1,2,3,4>,<1,4,3,2>}
, causing <1,3>
to be in safe and not be considered as differing.
safe
is only empty with the original dataset that was given, but if you have two people liking at least 2 or more of the same pizzassafe
clears away all the false negatives fromdiffMatches
. I was trying to exploit the symmetry of thelikeslikes
table, where if person A likes pizza 1 and person B likes pizza 2, then person A must also like pizza 2 and person B must also like pizza 1. If such a 'symmetrical' result can be found, then it is not a concern in thediffMatches
table.