zgulde/00_readme.md

## 00_readme.md

      
    Raw
  

              00_readme.md
            
          
    Random Utilities


## cat_scatter.ipynb

      
Display the source blob

    
Display the rendered blob

    
    Raw
  

              cat_scatter.ipynb
            
          
        Loading

      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
      
    
## market_basket_analysis.ipynb
{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "5f8cc744",
   "metadata": {},
   "source": [
    "# Market Basket Analysis"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "127d7f2c",
   "metadata": {},
   "source": [
    "## Support\n",
    "\n",
    "$$\n",
    "\\text{Support}(a) = P(a)\n",
    "$$\n",
    "\n",
    "For example, the support for salsa is the same as it's basket penetration, that\n",
    "is, the number of baskets (transactions, trips, visits, orders) that contain\n",
    "salsa divided by the total number of baskets.\n",
    "\n",
    "$$\n",
    "\\text{Support}(a, b) = P(a \\cap b)\n",
    "$$\n",
    "\n",
    "For example, the support for salsa and chips is the number of baskets that\n",
    "contain salsa **and** chips divided by the total number of baskets.\n",
    "\n",
    "The support for a combination of items will always be lower than the support for\n",
    "either individual item.\n",
    "\n",
    "*NB. In some ML contexts support refers to the number of observations, not\n",
    "here.*"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "12a54df0",
   "metadata": {},
   "source": [
    "## Confidence\n",
    "\n",
    "Confidence tells us the likelihood one item is bought knowing another was\n",
    "bought.\n",
    "\n",
    "$$\n",
    "\\text{Confidence}(a \\to b) = \\frac{\\text{Support}(a, b)}{\\text{Support(a)}}\n",
    "$$\n",
    "\n",
    "For example, $\\text{Confidence}(\\text{salsa} \\to \\text{chips})$ is the proportion of baskets that have both salsa and chips out of all the baskets that have salsa. Phrased another way, if someone buys salsa, how likely are they to buy chips?\n",
    "\n",
    "For the probability and statistically minded:\n",
    "\n",
    "$$\n",
    "\\text{Confidence}(a \\to b) = P(b | a)\n",
    "$$\n",
    "\n",
    "**Note**: $\\text{Confidence}(a \\to b)$ is **not** the same as $\\text{Confidence}(b \\to a)$. If we know that someonee is buying salsa, we might estimate there's a high likelihood they are buying chips (they need something to eat the salsa with). However if we know that someone is buying chips, the likelihood they are buying salsa is probably a little higher than normal, but not the same as the other way around (if they buy chips, they could get salsa, or guac, or queso)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5da50aaa",
   "metadata": {},
   "source": [
    "## Lift\n",
    "\n",
    "$$\n",
    "\\text{Lift}(a \\to b) = \\frac{\\text{Confidence}(a \\to b)}{\\text{Support}(b)}\n",
    "$$\n",
    "\n",
    "That is,\n",
    "\n",
    "$$\n",
    "\\begin{align}\n",
    "\\text{Lift}(a \\to b) &= \\frac{P(b|a)}{P(b)}\n",
    "\\\\ &= \\frac{\\frac{P(a \\cap b)}{P(a)}}{P(b)}\n",
    "\\\\ &= \\frac{P(a \\cap b)}{P(a)} \\times \\frac{1}{P(b)}\n",
    "\\\\ &= \\frac{P(a \\cap b)}{P(a) \\times P(b)}\n",
    "\\\\ &= \\frac{P(a)P(b|a)}{P(a) \\times P(b)}\n",
    "\\\\ &= \\frac{P(b|a)}{P(b)}\n",
    "\\end{align}\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "26447866",
   "metadata": {},
   "source": [
    "Lift is almost an index, it's a ratio of probabilities. Index of buying b within a compared to buying b overall."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7e17c8d5",
   "metadata": {},
   "source": [
    "### Example 1: Lift = 1\n",
    "\n",
    "Let's say that in a data set of 100 baskets, 10 of them contain toothpaste, and 20 of them contain salsa. 2 baskets contain both toothpaste and salsa.\n",
    "\n",
    "In this case, let's calculate $\\text{Lift}(\\text{salsa} \\to \\text{toothpaste})$.\n",
    "\n",
    "$$\n",
    "\\begin{align}\n",
    "\\text{Lift}(\\text{salsa} \\to \\text{toothpaste}) &= \\frac{\\text{Confidence}(\\text{salsa} \\to \\text{toothpaste})}{\\text{Support}(\\text{toothpaste})}\n",
    "\\\\ &= \\frac{P(\\text{toothpaste} | \\text{salsa})}{P(\\text{toothpaste})}\n",
    "\\\\ &= \\frac{.1}{.1}\n",
    "\\\\ &= 1\n",
    "\\end{align}\n",
    "$$\n",
    "\n",
    "Here we could conclude that buying toothpaste and salsa are independent events; buying one has no influence on someone buying the other."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "725fe782",
   "metadata": {},
   "source": [
    "### Example 2: Lift > 1\n",
    "\n",
    "In 100 baskets, 20 have salsa, 30 have chips, and 10 have both chips and salsa.\n",
    "\n",
    "$$\n",
    "\\begin{align}\n",
    "\\text{Lift}(\\text{salsa} \\to \\text{chips}) &= \\frac{\\text{Confidence}(\\text{salsa} \\to \\text{chips})}{\\text{Support}(\\text{chips})}\n",
    "\\\\ &= \\frac{P(\\text{chips} | \\text{salsa})}{P(\\text{chips})}\n",
    "\\\\ &= \\frac{\\frac{10}{20}}{\\frac{30}{100}}\n",
    "\\\\ &= \\frac{0.5}{0.3}\n",
    "\\\\ &= 1.\\overline{66}\n",
    "\\end{align}\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a66c82c5",
   "metadata": {},
   "source": [
    "### Example 3: Lift < 1\n",
    "\n",
    "In 100 baskets, 20 contain salsa A, 20 contain salsa B, and 2 contains salsa A and B. (How often do you buy two salsas? Normally I just buy one.)\n",
    "\n",
    "Lift = .1 / .2 = .25"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}

## pptx.ipynb

      
Display the source blob

    
Display the rendered blob

    
    Raw
  

              pptx.ipynb
            
          
        Loading

      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
	{
	"cells": [
	{
	"cell_type": "markdown",
	"id": "5f8cc744",
	"metadata": {},
	"source": [
	"# Market Basket Analysis"
	]
	},
	{
	"cell_type": "markdown",
	"id": "127d7f2c",
	"metadata": {},
	"source": [
	"## Support\n",
	"\n",
	"$$\n",
	"\\text{Support}(a) = P(a)\n",
	"$$\n",
	"\n",
	"For example, the support for salsa is the same as it's basket penetration, that\n",
	"is, the number of baskets (transactions, trips, visits, orders) that contain\n",
	"salsa divided by the total number of baskets.\n",
	"\n",
	"$$\n",
	"\\text{Support}(a, b) = P(a \\cap b)\n",
	"$$\n",
	"\n",
	"For example, the support for salsa and chips is the number of baskets that\n",
	"contain salsa and chips divided by the total number of baskets.\n",
	"\n",
	"The support for a combination of items will always be lower than the support for\n",
	"either individual item.\n",
	"\n",
	"*NB. In some ML contexts support refers to the number of observations, not\n",
	"here.*"
	]
	},
	{
	"cell_type": "markdown",
	"id": "12a54df0",
	"metadata": {},
	"source": [
	"## Confidence\n",
	"\n",
	"Confidence tells us the likelihood one item is bought knowing another was\n",
	"bought.\n",
	"\n",
	"$$\n",
	"\\text{Confidence}(a \\to b) = \\frac{\\text{Support}(a, b)}{\\text{Support(a)}}\n",
	"$$\n",
	"\n",
	"For example, $\\text{Confidence}(\\text{salsa} \\to \\text{chips})$ is the proportion of baskets that have both salsa and chips out of all the baskets that have salsa. Phrased another way, if someone buys salsa, how likely are they to buy chips?\n",
	"\n",
	"For the probability and statistically minded:\n",
	"\n",
	"$$\n",
	"\\text{Confidence}(a \\to b) = P(b \| a)\n",
	"$$\n",
	"\n",
	"Note: $\\text{Confidence}(a \\to b)$ is not the same as $\\text{Confidence}(b \\to a)$. If we know that someonee is buying salsa, we might estimate there's a high likelihood they are buying chips (they need something to eat the salsa with). However if we know that someone is buying chips, the likelihood they are buying salsa is probably a little higher than normal, but not the same as the other way around (if they buy chips, they could get salsa, or guac, or queso)."
	]
	},
	{
	"cell_type": "markdown",
	"id": "5da50aaa",
	"metadata": {},
	"source": [
	"## Lift\n",
	"\n",
	"$$\n",
	"\\text{Lift}(a \\to b) = \\frac{\\text{Confidence}(a \\to b)}{\\text{Support}(b)}\n",
	"$$\n",
	"\n",
	"That is,\n",
	"\n",
	"$$\n",
	"\\begin{align}\n",
	"\\text{Lift}(a \\to b) &= \\frac{P(b\|a)}{P(b)}\n",
	"\\\\ &= \\frac{\\frac{P(a \\cap b)}{P(a)}}{P(b)}\n",
	"\\\\ &= \\frac{P(a \\cap b)}{P(a)} \\times \\frac{1}{P(b)}\n",
	"\\\\ &= \\frac{P(a \\cap b)}{P(a) \\times P(b)}\n",
	"\\\\ &= \\frac{P(a)P(b\|a)}{P(a) \\times P(b)}\n",
	"\\\\ &= \\frac{P(b\|a)}{P(b)}\n",
	"\\end{align}\n",
	"$$"
	]
	},
	{
	"cell_type": "markdown",
	"id": "26447866",
	"metadata": {},
	"source": [
	"Lift is almost an index, it's a ratio of probabilities. Index of buying b within a compared to buying b overall."
	]
	},
	{
	"cell_type": "markdown",
	"id": "7e17c8d5",
	"metadata": {},
	"source": [
	"### Example 1: Lift = 1\n",
	"\n",
	"Let's say that in a data set of 100 baskets, 10 of them contain toothpaste, and 20 of them contain salsa. 2 baskets contain both toothpaste and salsa.\n",
	"\n",
	"In this case, let's calculate $\\text{Lift}(\\text{salsa} \\to \\text{toothpaste})$.\n",
	"\n",
	"$$\n",
	"\\begin{align}\n",
	"\\text{Lift}(\\text{salsa} \\to \\text{toothpaste}) &= \\frac{\\text{Confidence}(\\text{salsa} \\to \\text{toothpaste})}{\\text{Support}(\\text{toothpaste})}\n",
	"\\\\ &= \\frac{P(\\text{toothpaste} \| \\text{salsa})}{P(\\text{toothpaste})}\n",
	"\\\\ &= \\frac{.1}{.1}\n",
	"\\\\ &= 1\n",
	"\\end{align}\n",
	"$$\n",
	"\n",
	"Here we could conclude that buying toothpaste and salsa are independent events; buying one has no influence on someone buying the other."
	]
	},
	{
	"cell_type": "markdown",
	"id": "725fe782",
	"metadata": {},
	"source": [
	"### Example 2: Lift > 1\n",
	"\n",
	"In 100 baskets, 20 have salsa, 30 have chips, and 10 have both chips and salsa.\n",
	"\n",
	"$$\n",
	"\\begin{align}\n",
	"\\text{Lift}(\\text{salsa} \\to \\text{chips}) &= \\frac{\\text{Confidence}(\\text{salsa} \\to \\text{chips})}{\\text{Support}(\\text{chips})}\n",
	"\\\\ &= \\frac{P(\\text{chips} \| \\text{salsa})}{P(\\text{chips})}\n",
	"\\\\ &= \\frac{\\frac{10}{20}}{\\frac{30}{100}}\n",
	"\\\\ &= \\frac{0.5}{0.3}\n",
	"\\\\ &= 1.\\overline{66}\n",
	"\\end{align}\n",
	"$$"
	]
	},
	{
	"cell_type": "markdown",
	"id": "a66c82c5",
	"metadata": {},
	"source": [
	"### Example 3: Lift < 1\n",
	"\n",
	"In 100 baskets, 20 contain salsa A, 20 contain salsa B, and 2 contains salsa A and B. (How often do you buy two salsas? Normally I just buy one.)\n",
	"\n",
	"Lift = .1 / .2 = .25"
	]
	}
	],
	"metadata": {
	"kernelspec": {
	"display_name": "Python 3 (ipykernel)",
	"language": "python",
	"name": "python3"
	},
	"language_info": {
	"codemirror_mode": {
	"name": "ipython",
	"version": 3
	},
	"file_extension": ".py",
	"mimetype": "text/x-python",
	"name": "python",
	"nbconvert_exporter": "python",
	"pygments_lexer": "ipython3",
	"version": "3.9.7"
	}
	},
	"nbformat": 4,
	"nbformat_minor": 5
	}