jtb/inclusionexclusionprinciple.ipynb

## inclusionexclusionprinciple.ipynb
{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "name": "InclusionExclusionPrinciple.ipynb",
      "provenance": [],
      "collapsed_sections": [],
      "authorship_tag": "ABX9TyNeeoiFIJvg5ND1fcTK452/",
      "include_colab_link": true
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "view-in-github",
        "colab_type": "text"
      },
      "source": [
        "<a href=\"https://colab.research.google.com/gist/jtb/fab49100fdd3976a4d92cf3426b00c12/inclusionexclusionprinciple.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "L6X4j6MsCC_v",
        "colab_type": "text"
      },
      "source": [
        "# Inclusion-Exclusion Principle\n",
        "\n",
        "The goal here is to count the size of the union of sets, namely\n",
        "> N = | A[1] ∪ A[2] ∪ ... ∪ A[n] |\n",
        "\n",
        "where each A[i] is a set of elements.\n",
        "Instead of listing out each set A[i] explicitly, it can be helpful to talk abou the property P[i] that each element in A[i] satifies.\n",
        "\n",
        "For example, we may be interested in counting how many number from 1 to 100 are divisible by at least one of 2,3, and 5.\n",
        "\n",
        "The properties are  \n",
        "> P[1]: the element is divisible by 2  \n",
        "> P[2]: the element is divisible by 3  \n",
        "> P[3]: the element is divisible by 5  \n",
        "\n",
        "If we were to list these sets, it would look like\n",
        "> A[1]: {2, 4, ... , 100}  \n",
        "> A[2]: {3, 6, ... , 99}  \n",
        "> A[3]: {5, 10, ..., 100}\n",
        "\n",
        "Counting the union of sets can be difficult because one has to be careful not to double count elements. For example, 100 shows up in A[1] and A[3], so you can't just sum the sizes of A[1] through A[3].\n",
        "\n",
        "Instead, we can count all partial intersections of the properties\n",
        "> P[1]  \n",
        "> P[2]  \n",
        "> P[3]  \n",
        "> P[1] ∩ P[2]   \n",
        "> P[1] ∩ P[3]  \n",
        "> P[2] ∩ P[3]  \n",
        "> P[1] ∩ P[2] ∩ P[3]  \n",
        "\n",
        "Once we have that, we can use the Principle of Inclusion-Exclusion to take care of the bookkeeping and give us the correct tally. This may seem like more work, but it's sometimes easier to count when all properties must be satifised rather than some of them.\n",
        "\n",
        "In our divisibility example, it's easier to count how many number from 1 to 100 are divisible by 2 and 3, than it is to count how many numbers are divisible by 2 or 3 (or both).\n",
        "\n",
        "In the below code, one does not have to worry about how the Principle of Inclusion-Exclusion works. The only thing that needs to be done is to write a function that can take a list of properties and count the number of elements that satisfy all the properites. Let's go!\n"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "g71nCNMeBrnk",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# Here's the code that does the bookkeeping. It's not necessary to know how it works,\n",
        "# but make sure to hit the play button before going on to next step.\n",
        "from sympy import binomial\n",
        "\n",
        "def pieUtil(A, subset, index, func):\n",
        "  count = 0\n",
        "  for i in range(index, len(A)):\n",
        "    subset.append(A[i])\n",
        "    subset_count = func(subset)\n",
        "    if (subset_count > 0):\n",
        "      # no need to try supersets if this gives no new counts\n",
        "      count += (-1)**(len(subset) + 1) * subset_count\n",
        "      count += pieUtil(A, subset, i+1, func)\n",
        "    subset.pop()\n",
        "  return count\n",
        "\n",
        "def pie(A, func, complement = False):\n",
        "  count = pieUtil(A, [], 0, func)\n",
        "  if complement:\n",
        "    count = func([]) - count\n",
        "  return count\n",
        "\n",
        "# Sometimes it is enough to know the number of sets being intersected, rather\n",
        "# than the actual sets themselves. This leads to a much more efficient algoirithm\n",
        "# and should be used whenever possible. Instead of defining a function that\n",
        "# takes a list of sets, we can define a function that just takes in k, the\n",
        "# number of sets.\n",
        "def pieK(n, intersector, complement = False):\n",
        "  count = 0\n",
        "  for k in range(1, n+1):\n",
        "    count += (-1)**(k+1)*binomial(n,k)*intersector(k)\n",
        "  if complement:\n",
        "    count = intersector(0) - count\n",
        "  return count\n"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "f71Ws_FOd8DF",
        "colab_type": "code",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 35
        },
        "outputId": "dc41141a-031b-4600-b424-836118b5a7cb"
      },
      "source": [
        "# Let's start with a twist on the example given in the introduction:\n",
        "# How many numbers from 1 to 100 (inclusive) are not divisible by 2, 3, or 5?\n",
        "# Notice that this is actually (not P[1] AND not P[2] AND not P[3]).\n",
        "# But we can negate the problem to make it (P[1] OR P[2] OR P[3]).\n",
        "# We just need to remember to take the complement of the result.\n",
        "from functools import reduce\n",
        "from math import gcd\n",
        "\n",
        "# This gives the Least Common Multiple of a list of numbers.\n",
        "# example: find_lcm([2, 3]) = 6. We can use this to find how many numbers\n",
        "# between 1 and 100 are divisible by 2 AND 3 (by taking floor of 100 / 6).\n",
        "def find_lcm(list):\n",
        "  return reduce(lambda a,b: a*b // gcd(a,b), list)\n",
        "\n",
        "def divCount(N):\n",
        "  def func(properties):\n",
        "    # If the list of properties is empty, return the size of the universe.\n",
        "    if not properties:\n",
        "      return N\n",
        "    return N // find_lcm(properties)\n",
        "  return func\n",
        "\n",
        "print(pie([2,3,5], divCount(100), True))"
      ],
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "26\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "yyvc_4M33CGD",
        "colab_type": "code",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 173
        },
        "outputId": "821d17ca-61aa-4ae9-8cdf-e543981f3831"
      },
      "source": [
        "# Counting derangements\n",
        "# https://en.wikipedia.org/wiki/Inclusion%E2%80%93exclusion_principle#Counting_derangements\n",
        "from math import factorial\n",
        "\n",
        "# For a deck of n cards, a derangement is a shuffling of the cards such that\n",
        "# the 1st card is not in position 1, the 2nd card is not in position 2, etc.\n",
        "# Let P[i] be the property that the i'th card is in the i'th position\n",
        "# A derangement can be defined as\n",
        "#   (not P[1]) AND (not P[2]) ... AND (not P[n))\n",
        "# We solve the complement of the problem, namely\n",
        "#   P[1] OR P[2] ... OR P[n], and then subtract that result from the \n",
        "# total size of the universe to get number of derangements \n",
        "\n",
        "def fixedPointCount(n):\n",
        "  def func(k):\n",
        "    # If there are (at least) k fixed point (cards in correct location) then\n",
        "    # there are (n-k)! ways to place the remaining cards (note: these remaining\n",
        "    # cards may or may not be in their correct locations)\n",
        "    return factorial(n-k)\n",
        "  return func\n",
        "\n",
        "for i in range(1, 10):\n",
        "  print(\"Derangements for size\", i, \":\", pieK(i, fixedPointCount(i), True))"
      ],
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "Derangements for size 1 : 0\n",
            "Derangements for size 2 : 1\n",
            "Derangements for size 3 : 2\n",
            "Derangements for size 4 : 9\n",
            "Derangements for size 5 : 44\n",
            "Derangements for size 6 : 265\n",
            "Derangements for size 7 : 1854\n",
            "Derangements for size 8 : 14833\n",
            "Derangements for size 9 : 133496\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "L5mKKFei33K0",
        "colab_type": "code",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 87
        },
        "outputId": "4add741c-275e-4546-ddb8-f43bd67758b7"
      },
      "source": [
        "# For fun, here's a SAT solver.\n",
        "from functools import reduce\n",
        "\n",
        "def union_true(list):\n",
        "  un = set()\n",
        "  for i in list:\n",
        "    un = un.union(i[0])\n",
        "  return un\n",
        "\n",
        "def union_false(list):\n",
        "  un = set()\n",
        "  for i in list:\n",
        "    un = un.union(i[1])\n",
        "  return un\n",
        "\n",
        "def satSolver(N):\n",
        "  def func(properties):\n",
        "    t = union_true(properties)\n",
        "    f = union_false(properties)\n",
        "    if t.intersection(f):\n",
        "      return 0\n",
        "    return 2**(N - len(t.union(f)))\n",
        "  return func\n",
        "\n",
        "print(pie([[{'a', 'b'}, {'c'}],[{'b', 'c'},{}],[{},{'b'}],[{'c'},{'a'}]], satSolver(3), True))\n",
        "print(pie([[{'a'},{}], [{},{'a'}]], satSolver(1), True))\n",
        "\n",
        "# http://www.tfinley.net/software/pyglpk/ex_sat.html\n",
        "print(pie([[{},{1,3,4}],[{2,3},{4}],[{1,4},{2}],[{1,3,4},{}],[{2},{1,3}]], satSolver(4), True))\n",
        "\n",
        "# https://cs.stackexchange.com/questions/20117/whats-an-example-of-an-unsatisfiable-3-cnf-formula/20118\n",
        "x = 'x'\n",
        "y = 'y'\n",
        "z = 'z'\n",
        "num_variables = 3\n",
        "constraints = [\n",
        "  [{x, y, z}, {}],\n",
        "  [{x, y}, {z}],\n",
        "  [{x, z}, {y}],\n",
        "  [{x}, {y, z}],\n",
        "  [{y, z}, {x}],\n",
        "  [{y}, {x, z}],\n",
        "  [{z}, {x, y}],\n",
        "  [{}, {x, y, z}]\n",
        "]\n",
        "print(pie(constraints, satSolver(num_variables), True))"
      ],
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "1\n",
            "0\n",
            "8\n",
            "0\n"
          ],
          "name": "stdout"
        }
      ]
    }
  ]
}
	{
	"nbformat": 4,
	"nbformat_minor": 0,
	"metadata": {
	"colab": {
	"name": "InclusionExclusionPrinciple.ipynb",
	"provenance": [],
	"collapsed_sections": [],
	"authorship_tag": "ABX9TyNeeoiFIJvg5ND1fcTK452/",
	"include_colab_link": true
	},
	"kernelspec": {
	"name": "python3",
	"display_name": "Python 3"
	}
	},
	"cells": [
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "view-in-github",
	"colab_type": "text"
	},
	"source": [
	"<a href=\"https://colab.research.google.com/gist/jtb/fab49100fdd3976a4d92cf3426b00c12/inclusionexclusionprinciple.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "L6X4j6MsCC_v",
	"colab_type": "text"
	},
	"source": [
	"# Inclusion-Exclusion Principle\n",
	"\n",
	"The goal here is to count the size of the union of sets, namely\n",
	"> N = \| A[1] ∪ A[2] ∪ ... ∪ A[n] \|\n",
	"\n",
	"where each A[i] is a set of elements.\n",
	"Instead of listing out each set A[i] explicitly, it can be helpful to talk abou the property P[i] that each element in A[i] satifies.\n",
	"\n",
	"For example, we may be interested in counting how many number from 1 to 100 are divisible by at least one of 2,3, and 5.\n",
	"\n",
	"The properties are \n",
	"> P[1]: the element is divisible by 2 \n",
	"> P[2]: the element is divisible by 3 \n",
	"> P[3]: the element is divisible by 5 \n",
	"\n",
	"If we were to list these sets, it would look like\n",
	"> A[1]: {2, 4, ... , 100} \n",
	"> A[2]: {3, 6, ... , 99} \n",
	"> A[3]: {5, 10, ..., 100}\n",
	"\n",
	"Counting the union of sets can be difficult because one has to be careful not to double count elements. For example, 100 shows up in A[1] and A[3], so you can't just sum the sizes of A[1] through A[3].\n",
	"\n",
	"Instead, we can count all partial intersections of the properties\n",
	"> P[1] \n",
	"> P[2] \n",
	"> P[3] \n",
	"> P[1] ∩ P[2] \n",
	"> P[1] ∩ P[3] \n",
	"> P[2] ∩ P[3] \n",
	"> P[1] ∩ P[2] ∩ P[3] \n",
	"\n",
	"Once we have that, we can use the Principle of Inclusion-Exclusion to take care of the bookkeeping and give us the correct tally. This may seem like more work, but it's sometimes easier to count when all properties must be satifised rather than some of them.\n",
	"\n",
	"In our divisibility example, it's easier to count how many number from 1 to 100 are divisible by 2 and 3, than it is to count how many numbers are divisible by 2 or 3 (or both).\n",
	"\n",
	"In the below code, one does not have to worry about how the Principle of Inclusion-Exclusion works. The only thing that needs to be done is to write a function that can take a list of properties and count the number of elements that satisfy all the properites. Let's go!\n"
	]
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "g71nCNMeBrnk",
	"colab_type": "code",
	"colab": {}
	},
	"source": [
	"# Here's the code that does the bookkeeping. It's not necessary to know how it works,\n",
	"# but make sure to hit the play button before going on to next step.\n",
	"from sympy import binomial\n",
	"\n",
	"def pieUtil(A, subset, index, func):\n",
	" count = 0\n",
	" for i in range(index, len(A)):\n",
	" subset.append(A[i])\n",
	" subset_count = func(subset)\n",
	" if (subset_count > 0):\n",
	" # no need to try supersets if this gives no new counts\n",
	" count += (-1)*(len(subset) + 1) subset_count\n",
	" count += pieUtil(A, subset, i+1, func)\n",
	" subset.pop()\n",
	" return count\n",
	"\n",
	"def pie(A, func, complement = False):\n",
	" count = pieUtil(A, [], 0, func)\n",
	" if complement:\n",
	" count = func([]) - count\n",
	" return count\n",
	"\n",
	"# Sometimes it is enough to know the number of sets being intersected, rather\n",
	"# than the actual sets themselves. This leads to a much more efficient algoirithm\n",
	"# and should be used whenever possible. Instead of defining a function that\n",
	"# takes a list of sets, we can define a function that just takes in k, the\n",
	"# number of sets.\n",
	"def pieK(n, intersector, complement = False):\n",
	" count = 0\n",
	" for k in range(1, n+1):\n",
	" count += (-1)*(k+1)binomial(n,k)*intersector(k)\n",
	" if complement:\n",
	" count = intersector(0) - count\n",
	" return count\n"
	],
	"execution_count": null,
	"outputs": []
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "f71Ws_FOd8DF",
	"colab_type": "code",
	"colab": {
	"base_uri": "https://localhost:8080/",
	"height": 35
	},
	"outputId": "dc41141a-031b-4600-b424-836118b5a7cb"
	},
	"source": [
	"# Let's start with a twist on the example given in the introduction:\n",
	"# How many numbers from 1 to 100 (inclusive) are not divisible by 2, 3, or 5?\n",
	"# Notice that this is actually (not P[1] AND not P[2] AND not P[3]).\n",
	"# But we can negate the problem to make it (P[1] OR P[2] OR P[3]).\n",
	"# We just need to remember to take the complement of the result.\n",
	"from functools import reduce\n",
	"from math import gcd\n",
	"\n",
	"# This gives the Least Common Multiple of a list of numbers.\n",
	"# example: find_lcm([2, 3]) = 6. We can use this to find how many numbers\n",
	"# between 1 and 100 are divisible by 2 AND 3 (by taking floor of 100 / 6).\n",
	"def find_lcm(list):\n",
	" return reduce(lambda a,b: a*b // gcd(a,b), list)\n",
	"\n",
	"def divCount(N):\n",
	" def func(properties):\n",
	" # If the list of properties is empty, return the size of the universe.\n",
	" if not properties:\n",
	" return N\n",
	" return N // find_lcm(properties)\n",
	" return func\n",
	"\n",
	"print(pie([2,3,5], divCount(100), True))"
	],
	"execution_count": null,
	"outputs": [
	{
	"output_type": "stream",
	"text": [
	"26\n"
	],
	"name": "stdout"
	}
	]
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "yyvc_4M33CGD",
	"colab_type": "code",
	"colab": {
	"base_uri": "https://localhost:8080/",
	"height": 173
	},
	"outputId": "821d17ca-61aa-4ae9-8cdf-e543981f3831"
	},
	"source": [
	"# Counting derangements\n",
	"# https://en.wikipedia.org/wiki/Inclusion%E2%80%93exclusion_principle#Counting_derangements\n",
	"from math import factorial\n",
	"\n",
	"# For a deck of n cards, a derangement is a shuffling of the cards such that\n",
	"# the 1st card is not in position 1, the 2nd card is not in position 2, etc.\n",
	"# Let P[i] be the property that the i'th card is in the i'th position\n",
	"# A derangement can be defined as\n",
	"# (not P[1]) AND (not P[2]) ... AND (not P[n))\n",
	"# We solve the complement of the problem, namely\n",
	"# P[1] OR P[2] ... OR P[n], and then subtract that result from the \n",
	"# total size of the universe to get number of derangements \n",
	"\n",
	"def fixedPointCount(n):\n",
	" def func(k):\n",
	" # If there are (at least) k fixed point (cards in correct location) then\n",
	" # there are (n-k)! ways to place the remaining cards (note: these remaining\n",
	" # cards may or may not be in their correct locations)\n",
	" return factorial(n-k)\n",
	" return func\n",
	"\n",
	"for i in range(1, 10):\n",
	" print(\"Derangements for size\", i, \":\", pieK(i, fixedPointCount(i), True))"
	],
	"execution_count": null,
	"outputs": [
	{
	"output_type": "stream",
	"text": [
	"Derangements for size 1 : 0\n",
	"Derangements for size 2 : 1\n",
	"Derangements for size 3 : 2\n",
	"Derangements for size 4 : 9\n",
	"Derangements for size 5 : 44\n",
	"Derangements for size 6 : 265\n",
	"Derangements for size 7 : 1854\n",
	"Derangements for size 8 : 14833\n",
	"Derangements for size 9 : 133496\n"
	],
	"name": "stdout"
	}
	]
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "L5mKKFei33K0",
	"colab_type": "code",
	"colab": {
	"base_uri": "https://localhost:8080/",
	"height": 87
	},
	"outputId": "4add741c-275e-4546-ddb8-f43bd67758b7"
	},
	"source": [
	"# For fun, here's a SAT solver.\n",
	"from functools import reduce\n",
	"\n",
	"def union_true(list):\n",
	" un = set()\n",
	" for i in list:\n",
	" un = un.union(i[0])\n",
	" return un\n",
	"\n",
	"def union_false(list):\n",
	" un = set()\n",
	" for i in list:\n",
	" un = un.union(i[1])\n",
	" return un\n",
	"\n",
	"def satSolver(N):\n",
	" def func(properties):\n",
	" t = union_true(properties)\n",
	" f = union_false(properties)\n",
	" if t.intersection(f):\n",
	" return 0\n",
	" return 2**(N - len(t.union(f)))\n",
	" return func\n",
	"\n",
	"print(pie([[{'a', 'b'}, {'c'}],[{'b', 'c'},{}],[{},{'b'}],[{'c'},{'a'}]], satSolver(3), True))\n",
	"print(pie([[{'a'},{}], [{},{'a'}]], satSolver(1), True))\n",
	"\n",
	"# http://www.tfinley.net/software/pyglpk/ex_sat.html\n",
	"print(pie([[{},{1,3,4}],[{2,3},{4}],[{1,4},{2}],[{1,3,4},{}],[{2},{1,3}]], satSolver(4), True))\n",
	"\n",
	"# https://cs.stackexchange.com/questions/20117/whats-an-example-of-an-unsatisfiable-3-cnf-formula/20118\n",
	"x = 'x'\n",
	"y = 'y'\n",
	"z = 'z'\n",
	"num_variables = 3\n",
	"constraints = [\n",
	" [{x, y, z}, {}],\n",
	" [{x, y}, {z}],\n",
	" [{x, z}, {y}],\n",
	" [{x}, {y, z}],\n",
	" [{y, z}, {x}],\n",
	" [{y}, {x, z}],\n",
	" [{z}, {x, y}],\n",
	" [{}, {x, y, z}]\n",
	"]\n",
	"print(pie(constraints, satSolver(num_variables), True))"
	],
	"execution_count": null,
	"outputs": [
	{
	"output_type": "stream",
	"text": [
	"1\n",
	"0\n",
	"8\n",
	"0\n"
	],
	"name": "stdout"
	}
	]
	}
	]
	}