Skip to content

Instantly share code, notes, and snippets.

@shurru
Created February 2, 2021 19:53
Show Gist options
  • Save shurru/d84349f7eaa873e1db641b14f2ec69b2 to your computer and use it in GitHub Desktop.
Save shurru/d84349f7eaa873e1db641b14f2ec69b2 to your computer and use it in GitHub Desktop.
Simple classifier using just 1 feature
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Goal: Create a 2 class classifier using a single feature \n",
"\n",
"- We are provided with two normally distributed classes which have some overlap \n",
"- How do we create a simple classifier which can distinguish (with somewhat decent performance) a given value into either of the two classes"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np \n",
"import matplotlib.pyplot as plt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"1. Create the two normally distributed classes with some degree of overlap - can play around with the mean and std of both using np.random.normal\n",
"2. Visualize the two using binning\n"
]
},
{
"cell_type": "code",
"execution_count": 119,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[<matplotlib.lines.Line2D at 0x7ff5e2aaf0d0>]"
]
},
"execution_count": 119,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"np.random.seed(1)\n",
"\n",
"A= np.random.normal(100, 20, 100)\n",
"B= np.random.normal (160, 30, 100)\n",
"\n",
"mean_A= np.mean(A)\n",
"std_A= np.std (A)\n",
"\n",
"mean_B= np.mean(B)\n",
"std_B= np.std (B)\n",
"\n",
"count, bins, ignored = plt.hist(A, 10, density=True)\n",
"plt.plot(bins, 1/(std_A * np.sqrt(2 * np.pi)) *\n",
" np.exp( - (bins - mean_A)**2 / (2 * std_A**2) ),\n",
" linewidth=2, color='r')\n",
"\n",
"count, bins, ignored = plt.hist(B, 10, density=True)\n",
"plt.plot(bins, 1/(std_B * np.sqrt(2 * np.pi)) *\n",
" np.exp( - (bins - mean_B)**2 / (2 * std_B**2) ),\n",
" linewidth=2, color='r')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- Our classifier calculates the Z-score of the value provided when compared with both classes' mean and std \n",
"- A z-score is often used as a tool to perform outlier detection \n",
"- In our classifier method defined below, the classifier takes in both classes' distributions as well as the unknown value we're trying to classify \n",
"- It returns it's best estimate of whether the Z-score falls within the purview of class A or class B "
]
},
{
"cell_type": "code",
"execution_count": 131,
"metadata": {},
"outputs": [],
"source": [
"def ourclassifier (A, B, value): \n",
" mean_A= np.mean(A)\n",
" std_A= np.std (A)\n",
" \n",
" mean_B= np.mean(B)\n",
" std_B= np.std (B)\n",
" \n",
" z_A =np.abs(( value- mean_A )/ std_A )\n",
" z_B = np.abs((value- mean_B)/ std_B)\n",
" \n",
"# print (mean_A,std_A, z_A) \n",
"# print (mean_B,std_B, z_B)\n",
" \n",
" if z_A < z_B: \n",
" if z_A <3 :\n",
" return (\"A\")\n",
" else: \n",
" return (\"outlier\")\n",
" \n",
" elif z_B < z_A: \n",
" if z_B <3: \n",
" return (\"B\")\n",
" else: \n",
" return (\"outlier\")\n",
" "
]
},
{
"cell_type": "code",
"execution_count": 134,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"274 outlier\n"
]
}
],
"source": [
"## Generate a Random value and it'll let you know its class and whether it's an outlier. \n",
"value = np.random.randint (40, 300)\n",
"\n",
"\n",
"print (value, ourclassifier (A, B, value)) "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"our"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.6"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment