tmcw/sets.md

## sets.md

      
    Raw
  

              sets.md
            
          
    Here's a sort of unsolved logic/programming problem.
I made this site, Old Fashioned, for cocktail recipes. It has structured data for ingredients and recipes for cocktails. And one of its neat features is that, given certain ingredients, it shows you what's possible or what's nearly possible.
I've been thinking about the opposite angle: from which set of 5 ingredients can you make the highest number of different cocktails? How does one correctly solve this problem.
If formal descriptions are your jam: let's say:

You have 100 different ingredients
You have 20 cocktails, each of which use 2-6 ingredients
Which 5 ingredients maximize the cocktail-making possibilities? What about 10 ingredients?

I think one approach is iterative:

Assume you have all ingredients, and can make all cocktails.
For each ingredient, count how many cocktails it is included in.
Sort the list of ingredients by # of cocktails included in, remove all ingredients with the lowest number of added 'tails'.
Recalculate the list of usage patterns, and repeat.

However, this misses what I think might be an important corner: families. Like, there may be a bunch of cocktails that require whiskey, sugar, water, and… and a bunch that include gin, tonic, and…, and this greedy approach will only find one 'family': but ideally I'd be able to split off and efficiently find if there are multiple winners or near-winners.
I'm not obsessed with performance in this case, though a naïve implementation does seem pretty catastrophic, as soon as I start thinking about, for example, finding "all combinations of 5 ingredients", the factorials start looking scary. So the fully naive "compute it all" solution probably isn't the right one.
🤷‍♂️ I'm not sure how to solve this problem! Going to have fun with it, and see if there's a solution, though. Let me know if you have any thoughts!