Skip to content

Instantly share code, notes, and snippets.

@tmcw
Created June 27, 2020 02:40
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save tmcw/c6bdcfe505057ed6a0f356cfd02d4d52 to your computer and use it in GitHub Desktop.
Save tmcw/c6bdcfe505057ed6a0f356cfd02d4d52 to your computer and use it in GitHub Desktop.

Here's a sort of unsolved logic/programming problem.

I made this site, Old Fashioned, for cocktail recipes. It has structured data for ingredients and recipes for cocktails. And one of its neat features is that, given certain ingredients, it shows you what's possible or what's nearly possible.

I've been thinking about the opposite angle: from which set of 5 ingredients can you make the highest number of different cocktails? How does one correctly solve this problem.

If formal descriptions are your jam: let's say:

  • You have 100 different ingredients
  • You have 20 cocktails, each of which use 2-6 ingredients
  • Which 5 ingredients maximize the cocktail-making possibilities? What about 10 ingredients?

I think one approach is iterative:

  1. Assume you have all ingredients, and can make all cocktails.
  2. For each ingredient, count how many cocktails it is included in.
  3. Sort the list of ingredients by # of cocktails included in, remove all ingredients with the lowest number of added 'tails'.
  4. Recalculate the list of usage patterns, and repeat.

However, this misses what I think might be an important corner: families. Like, there may be a bunch of cocktails that require whiskey, sugar, water, and… and a bunch that include gin, tonic, and…, and this greedy approach will only find one 'family': but ideally I'd be able to split off and efficiently find if there are multiple winners or near-winners.

I'm not obsessed with performance in this case, though a naïve implementation does seem pretty catastrophic, as soon as I start thinking about, for example, finding "all combinations of 5 ingredients", the factorials start looking scary. So the fully naive "compute it all" solution probably isn't the right one.

🤷‍♂️ I'm not sure how to solve this problem! Going to have fun with it, and see if there's a solution, though. Let me know if you have any thoughts!

@bcardiff
Copy link

I needed to change a bit the csv, the expanded program is in https://gist.github.com/bcardiff/462ddcae5b8e886e6d69975d9830d77e . Now I need to tweak the timeout... maybe tomorrow! This was fun.

@fgregg
Copy link

fgregg commented Jun 28, 2020

Here's a branch and bound solution. pretty fast!

import random
from typing import AbstractSet, Set, FrozenSet, Optional

Cocktail = FrozenSet[str]
Cocktails = AbstractSet[Cocktail]
Ingredients = Set[str]

class BranchBound(object):
    def __init__(self, max_calls: int, max_size: int) -> None:
        self.calls: int = max_calls
        self.max_size: int = max_size
        
        self.highest_score: int = 0
        self.highest: Cocktails = set()

    def search(self,
               candidates: Cocktails,
               partial: Optional[Cocktails] = None) -> Cocktails:
        if partial is None:
            partial = set()

        if self.calls <= 0:
            print('early stop')
            return self.highest

        self.calls -= 1

        score = len(partial)
        
        if score > self.highest_score:
            self.highest = partial
            self.highest_score = score
            print(self.highest_score)

        # what cocktails could be added without going over our
        # ingredient budget?
        partial_ingredients: Ingredients
        partial_ingredients = set().union(*partial)
        candidates = {cocktail for cocktail in candidates
                      if len(cocktail | partial_ingredients) <= self.max_size}

        possible_increment = len(candidates)

        # if adding all the associated ingredients of the candidates
        # takes us over the ingredient budget, then not all the
        # candidates can feasibly be added to our partial
        # solution. So, if there will be excess ingredients, we'll
        # reduce the upper bound of how many cocktails we might be
        # able to cover (possible_increment).
        candidate_ingredients: Ingredients
        candidate_ingredients = set().union(*candidates)
        excess_ingredients = (len(candidate_ingredients | partial_ingredients)
                              - self.max_size)

        if excess_ingredients > 0:
            # best case is that excess ingredients are concentrated in
            # some cocktails. if we are in this best case, then if we
            # remove the cocktails that add the most new ingredients
            # we'll be back under the ingredient budget
            #
            # note that we are just updating the bound. it could actually
            # be that we want to add one of these cocktails that add
            # a lot of ingredients
            ingredient_increases = sorted((len(cocktail - partial_ingredients)
                                           for cocktail in candidates),
                                          reverse=True)
            for ingredient_increase in ingredient_increases:
                possible_increment -= 1
                excess_ingredients -= ingredient_increase
                if excess_ingredients <= 0:
                    break

        threshold = self.highest_score - score
        
        if candidates and possible_increment > threshold:

            # i've tried a few heuristics like "next smallest
            # cocktail," and a random choice seems to be best so far
            best = min(candidates, key=lambda x: random.random())

            new_partial_ingredients = partial_ingredients | best
            covered_candidates = {cocktail for cocktail in candidates
                                  if cocktail <= new_partial_ingredients}

            self.search(candidates - covered_candidates,
                        partial | covered_candidates)

            # if a cocktail is not part of the optimum set than then
            # the optimum set cannot have the cocktail as a subset
            remaining = {cocktail for cocktail in candidates - set(best)
                         if not (best <= (cocktail | partial_ingredients))}

            self.search(remaining, partial)

        return self.highest


if __name__ == '__main__':
    import csv

    cocktails = {}

    with open('ingredients.csv') as f:
        reader = csv.reader(f)
        for row in reader:
            name, *ingredients = row
            cocktails[frozenset(ingredients)] = name
    
    bb = BranchBound(8000000, 16)
    best = bb.search(cocktails.keys())
    
    print(sorted(set().union(*best)))
    print(sorted(cocktails[k] for k in best))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment