Skip to content

Instantly share code, notes, and snippets.

@dariusf
Last active August 26, 2020 13:46
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save dariusf/36ad9701b5714bebbf8775731a085534 to your computer and use it in GitHub Desktop.
Save dariusf/36ad9701b5714bebbf8775731a085534 to your computer and use it in GitHub Desktop.

We're looking for pairs of customers who like exactly the same pizza, i.e. their respective sets of liked (and disliked) pizzas are equal.

To get this, we can find customers who differ in their preferences and take the complement of that. The general structure of the answer is thus:

all_pairs_of_customers = ...
customers_who_differ = ...
query = all_pairs_of_customers - customers_who_differ

We can get customers who differ from likesdislikes, but it's incomplete: assuming we know that <ca, cb> is a pair of customers who differ by virtue of being in likesdislikes, <cb, ca> must also be a pair of customers who differ, however they may not necessarily be in likesdislikes. Thus we have to ensure that we have all the customers who differ by adding the converse:

customers_who_differ = likesdislikes ∪ π(cb,ca)(likesdislikes)

Now to find all pairs of customers. We can get them from likes (for part d, as we have to exclude customers who don't like any pizza, i.e. are not in likes; for part e we would do exactly the same thing but with customers instead).

Say likes is {<1,2>,<3,4>}. When we try to get all pairs, we see some undesirable rows:

π(ca,cb)(ρ(ca,pa)(likes) × ρ(cb,pb)(likes)) = {<1,1>,<1,3>,<3,3>,<3,1>}

We only want the pair <1,3>: <1,1> and <3,3> trivially like the same pizzas, and <3,1> is redundant beacuse we already extended likesdislikes with its converse above. Thus we select with the condition ca < cb (as the question says; it's a stronger condition than != as it also removes <3,1>) to get the final set of all pairs:

all_pairs_of_customers = π(ca,cb)(σ(ca<cb)(ρ(ca,pa)(likes) × ρ(cb,pb)(likes)))

Let's also consider alternative answers we saw:

pairsnomatch = "P([cname, cdname], S(cname < cdname /\\ dpizza!=pizza, likes * R(r(cdname, dpizza), likes)))"alllikepairs = "P([cname, cdname], S(cname < cdname, likes * R(r(cdname, dpizza), likes)))"query = f"R(s(ca, cb), {alllikepairs} - {pairsnomatch})"

This answer has the right idea, but it doesn't use likesdislikes and instead tries to get the pairs of customers with differing preferences from likes directly.

Unfortunately this is not sufficient; consider the case where likes is {<1,2>,<3,2>,<1,4>,<3,4>}, i.e. there are two customers who like more than one pizza in common. pairsnomatch will be {<1,3>}, which is not right as customers 1 and 3 do like exactly the same pizza.

otherlikes = "R(otherlikes(othername, otherpizza), likes)"
hasMatches = f"S(cname<othername /\\ pizza==otherpizza, likes*{otherlikes})"
diffMatches = f"S(cname<othername /\\ pizza!=otherpizza, likes*{otherlikes})"
diffMatches2 = f"R(secondmatch(cnametwo, pizzatwo, othernametwo, otherpizzatwo), {diffMatches})"
safe = f"S(cname==cnametwo /\\ othername==othernametwo /\\ pizza==otherpizzatwo /\\ pizzatwo==otherpizza, {diffMatches}*{diffMatches2})"
toFilterOut = f"P([cname, pizza, othername, otherpizza], {safe})"
query = f"P([cname, othername], {hasMatches}) - P([cname, othername], {diffMatches} - {toFilterOut}))"

This answer fixes the problem by detecting the false negatives in safe and removing them from the set of customers who differ.

Intuitively, if we have a pair of customers in diffMatches who like different pizza, but who also like pizza that the other likes, then their preferences do not differ, and we can consider that pair a false negative. A example of this with the input above is diffMatches being {<1,2,3,4>,<1,4,3,2>}, causing <1,3> to be in safe and not be considered as differing.

@ariadn3
Copy link

ariadn3 commented Aug 26, 2020

safe is only empty with the original dataset that was given, but if you have two people liking at least 2 or more of the same pizzas safe clears away all the false negatives from diffMatches. I was trying to exploit the symmetry of the likeslikes table, where if person A likes pizza 1 and person B likes pizza 2, then person A must also like pizza 2 and person B must also like pizza 1. If such a 'symmetrical' result can be found, then it is not a concern in the diffMatches table.

@dariusf
Copy link
Author

dariusf commented Aug 26, 2020

Thanks for clarifying! That also means the answer before yours is incomplete due to the false negatives. I've written a new explanation for your answer which I hope represents it more accurately.

@gordengorden
Copy link

Hi darius, I seem to still be unable to derive the answer using the solution as you have explained above
from basic_relarg import * all_pairs_of_customers = "P([ca,cb],S(ca<cb,(R(d(ca,pa),likes) * R(l(cb,pb),likes))))" likesdislikes = "P([ca,cb],S(ca!=cb/\pizza==pi,R(l(ca,pizza),likes) * R(d(cb,pi),(P([cname],customers) * pizzas) - likes)))" customers_who_differ = f"{likesdislikes} \/ P([cb,ca],{likesdislikes})" query = f"{all_pairs_of_customers} - {customers_who_differ}" print_table(complete(query,db))

using the above I only managed to get
+-----+-----+ | ca | cb | +-----+-----+ | 102 | 103 | | 102 | 104 | | 103 | 104 | +-----+-----+

which is the same as using query = "S(pi==pizza /\ ca < cb,R(l(ca,pi),likes) * R(r(cb,pizza),likes))"
which only checks for people who enjoy the same pizza but not exactly

@dariusf
Copy link
Author

dariusf commented Aug 26, 2020

The idea is right, but there are two issues with the code:

  1. Union is written as |, not \/ (which is disjunction). Having disjunction there really should cause an error. Maybe you could report that as a bug in the forum and/or contribute a fix? 🙂
  2. When substituting queries, the left and right sides should be parenthesized:
query = f"({all_pairs_of_customers}) - ({customers_who_differ})"

Unfortunately, queries are currently represented as Python strings. This means we have to be careful to parenthesize manually when substituting or the meaning of compound expressions could change. In this case the final query gets parsed as (a - b) | c instead of a - (b | c), which isn't what you mean.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment