Skip to content

Instantly share code, notes, and snippets.

@harjeet88
Last active August 29, 2015 14:23
Show Gist options
  • Save harjeet88/c82625b5aa001b940e9a to your computer and use it in GitHub Desktop.
Save harjeet88/c82625b5aa001b940e9a to your computer and use it in GitHub Desktop.
Scikit Learn Lesson 1
{
"metadata": {
"name": "",
"signature": "sha256:a586971f89ed3f3285b0be52e95eb10dd2e6382d35038e7df4d8ea857f385ca8"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": [
"Learning Data Science in Python"
]
},
{
"cell_type": "heading",
"level": 6,
"metadata": {},
"source": [
"Harjeet Kumar Rajpal | harjeet.kumar24@gmail.com | Linkedin : https://in.linkedin.com/pub/harjeet-kumar/14/a4a/620 "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Scikit learn is very simple and powerful tool for implementing data science solutions. Let us try to understand how to create a simple model using sklearn and how to use it for predicting solution. First of all we will write down some import statements. "
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"from sklearn import datasets"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 15
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This statement imports different test datasets in sklearn available for learning. it contains different datasets. We will use iris dataset for this example. this dataset contain information about differnt flowers. length , bredth of flower patels. After training a model on training data, we will use input test data and predict that these input features are of which flower."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"from sklearn import metrics"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 16
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"metrics will used to test how accurate our trained model is. "
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"from sklearn.tree import DecisionTreeClassifier"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 17
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We are going to use Decision Tree for classification."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# load the iris datasets\n",
"dataset = datasets.load_iris()"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 18
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This statement shows that how to load iris dataset."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# fit a CART model to the data\n",
"model = DecisionTreeClassifier()"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 19
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here we have initialized Decision Tree model. Every Models in sklearn implemets estimater. Estimater has does two things. It uses fit method to train itself on training data and learn the pattren in data. "
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"model.fit(dataset.data, dataset.target)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 20,
"text": [
"DecisionTreeClassifier(compute_importances=None, criterion='gini',\n",
" max_depth=None, max_features=None, max_leaf_nodes=None,\n",
" min_density=None, min_samples_leaf=1, min_samples_split=2,\n",
" random_state=None, splitter='best')"
]
}
],
"prompt_number": 20
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This is how we train a model on training data. we give input examples that saying that if dataset.data is the input, it belolong to dataset.target flowers."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"print(model)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"DecisionTreeClassifier(compute_importances=None, criterion='gini',\n",
" max_depth=None, max_features=None, max_leaf_nodes=None,\n",
" min_density=None, min_samples_leaf=1, min_samples_split=2,\n",
" random_state=None, splitter='best')\n"
]
}
],
"prompt_number": 21
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here this will just show different parametrs of model."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# make predictions\n",
"expected = dataset.target\n",
"predicted = model.predict(dataset.data)\n"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 22
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"After training the model, we can predict type of flower once we input any test data."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# summarize the fit of the model\n",
"print(metrics.classification_report(expected, predicted))\n"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
" precision recall f1-score support\n",
"\n",
" 0 1.00 1.00 1.00 50\n",
" 1 1.00 1.00 1.00 50\n",
" 2 1.00 1.00 1.00 50\n",
"\n",
"avg / total 1.00 1.00 1.00 150\n",
"\n"
]
}
],
"prompt_number": 23
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"These statements compare output predicted and expected output and shows how accurate was the model"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"print(metrics.confusion_matrix(expected, predicted))"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"[[50 0 0]\n",
" [ 0 50 0]\n",
" [ 0 0 50]]\n"
]
}
],
"prompt_number": 24
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This statement shows confusion matrix."
]
}
],
"metadata": {}
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment