Skip to content

Instantly share code, notes, and snippets.

@zgulde
Created July 20, 2022 02:31
Show Gist options
  • Save zgulde/1f0d74ec4576448f7c46f4028cd48001 to your computer and use it in GitHub Desktop.
Save zgulde/1f0d74ec4576448f7c46f4028cd48001 to your computer and use it in GitHub Desktop.

Random Utilities

Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"id": "5f8cc744",
"metadata": {},
"source": [
"# Market Basket Analysis"
]
},
{
"cell_type": "markdown",
"id": "127d7f2c",
"metadata": {},
"source": [
"## Support\n",
"\n",
"$$\n",
"\\text{Support}(a) = P(a)\n",
"$$\n",
"\n",
"For example, the support for salsa is the same as it's basket penetration, that\n",
"is, the number of baskets (transactions, trips, visits, orders) that contain\n",
"salsa divided by the total number of baskets.\n",
"\n",
"$$\n",
"\\text{Support}(a, b) = P(a \\cap b)\n",
"$$\n",
"\n",
"For example, the support for salsa and chips is the number of baskets that\n",
"contain salsa **and** chips divided by the total number of baskets.\n",
"\n",
"The support for a combination of items will always be lower than the support for\n",
"either individual item.\n",
"\n",
"*NB. In some ML contexts support refers to the number of observations, not\n",
"here.*"
]
},
{
"cell_type": "markdown",
"id": "12a54df0",
"metadata": {},
"source": [
"## Confidence\n",
"\n",
"Confidence tells us the likelihood one item is bought knowing another was\n",
"bought.\n",
"\n",
"$$\n",
"\\text{Confidence}(a \\to b) = \\frac{\\text{Support}(a, b)}{\\text{Support(a)}}\n",
"$$\n",
"\n",
"For example, $\\text{Confidence}(\\text{salsa} \\to \\text{chips})$ is the proportion of baskets that have both salsa and chips out of all the baskets that have salsa. Phrased another way, if someone buys salsa, how likely are they to buy chips?\n",
"\n",
"For the probability and statistically minded:\n",
"\n",
"$$\n",
"\\text{Confidence}(a \\to b) = P(b | a)\n",
"$$\n",
"\n",
"**Note**: $\\text{Confidence}(a \\to b)$ is **not** the same as $\\text{Confidence}(b \\to a)$. If we know that someonee is buying salsa, we might estimate there's a high likelihood they are buying chips (they need something to eat the salsa with). However if we know that someone is buying chips, the likelihood they are buying salsa is probably a little higher than normal, but not the same as the other way around (if they buy chips, they could get salsa, or guac, or queso)."
]
},
{
"cell_type": "markdown",
"id": "5da50aaa",
"metadata": {},
"source": [
"## Lift\n",
"\n",
"$$\n",
"\\text{Lift}(a \\to b) = \\frac{\\text{Confidence}(a \\to b)}{\\text{Support}(b)}\n",
"$$\n",
"\n",
"That is,\n",
"\n",
"$$\n",
"\\begin{align}\n",
"\\text{Lift}(a \\to b) &= \\frac{P(b|a)}{P(b)}\n",
"\\\\ &= \\frac{\\frac{P(a \\cap b)}{P(a)}}{P(b)}\n",
"\\\\ &= \\frac{P(a \\cap b)}{P(a)} \\times \\frac{1}{P(b)}\n",
"\\\\ &= \\frac{P(a \\cap b)}{P(a) \\times P(b)}\n",
"\\\\ &= \\frac{P(a)P(b|a)}{P(a) \\times P(b)}\n",
"\\\\ &= \\frac{P(b|a)}{P(b)}\n",
"\\end{align}\n",
"$$"
]
},
{
"cell_type": "markdown",
"id": "26447866",
"metadata": {},
"source": [
"Lift is almost an index, it's a ratio of probabilities. Index of buying b within a compared to buying b overall."
]
},
{
"cell_type": "markdown",
"id": "7e17c8d5",
"metadata": {},
"source": [
"### Example 1: Lift = 1\n",
"\n",
"Let's say that in a data set of 100 baskets, 10 of them contain toothpaste, and 20 of them contain salsa. 2 baskets contain both toothpaste and salsa.\n",
"\n",
"In this case, let's calculate $\\text{Lift}(\\text{salsa} \\to \\text{toothpaste})$.\n",
"\n",
"$$\n",
"\\begin{align}\n",
"\\text{Lift}(\\text{salsa} \\to \\text{toothpaste}) &= \\frac{\\text{Confidence}(\\text{salsa} \\to \\text{toothpaste})}{\\text{Support}(\\text{toothpaste})}\n",
"\\\\ &= \\frac{P(\\text{toothpaste} | \\text{salsa})}{P(\\text{toothpaste})}\n",
"\\\\ &= \\frac{.1}{.1}\n",
"\\\\ &= 1\n",
"\\end{align}\n",
"$$\n",
"\n",
"Here we could conclude that buying toothpaste and salsa are independent events; buying one has no influence on someone buying the other."
]
},
{
"cell_type": "markdown",
"id": "725fe782",
"metadata": {},
"source": [
"### Example 2: Lift > 1\n",
"\n",
"In 100 baskets, 20 have salsa, 30 have chips, and 10 have both chips and salsa.\n",
"\n",
"$$\n",
"\\begin{align}\n",
"\\text{Lift}(\\text{salsa} \\to \\text{chips}) &= \\frac{\\text{Confidence}(\\text{salsa} \\to \\text{chips})}{\\text{Support}(\\text{chips})}\n",
"\\\\ &= \\frac{P(\\text{chips} | \\text{salsa})}{P(\\text{chips})}\n",
"\\\\ &= \\frac{\\frac{10}{20}}{\\frac{30}{100}}\n",
"\\\\ &= \\frac{0.5}{0.3}\n",
"\\\\ &= 1.\\overline{66}\n",
"\\end{align}\n",
"$$"
]
},
{
"cell_type": "markdown",
"id": "a66c82c5",
"metadata": {},
"source": [
"### Example 3: Lift < 1\n",
"\n",
"In 100 baskets, 20 contain salsa A, 20 contain salsa B, and 2 contains salsa A and B. (How often do you buy two salsas? Normally I just buy one.)\n",
"\n",
"Lift = .1 / .2 = .25"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.7"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment