Skip to content

Instantly share code, notes, and snippets.

@dorvak
Created September 24, 2013 15:00
Show Gist options
  • Save dorvak/6686057 to your computer and use it in GitHub Desktop.
Save dorvak/6686057 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"metadata": {
"name": "Assignment 1"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": [
"Question 1\n",
"What are the column names of the dataset?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"What are the column names of the dataset?"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import pandas as pd\n",
"df = pd.read_csv(\"hw1_data.csv\")\n",
"df.columns"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "pyout",
"prompt_number": 17,
"text": [
"Index([Ozone, Solar.R, Wind, Temp, Month, Day], dtype=object)"
]
}
],
"prompt_number": 17
},
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": [
"Question 2"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Extract the first 2 rows of the data frame and print them to the console. What does the output look like?"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"df[:2]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Ozone</th>\n",
" <th>Solar.R</th>\n",
" <th>Wind</th>\n",
" <th>Temp</th>\n",
" <th>Month</th>\n",
" <th>Day</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td> 41</td>\n",
" <td> 190</td>\n",
" <td> 7.4</td>\n",
" <td> 67</td>\n",
" <td> 5</td>\n",
" <td> 1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td> 36</td>\n",
" <td> 118</td>\n",
" <td> 8.0</td>\n",
" <td> 72</td>\n",
" <td> 5</td>\n",
" <td> 2</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"output_type": "pyout",
"prompt_number": 7,
"text": [
" Ozone Solar.R Wind Temp Month Day\n",
"0 41 190 7.4 67 5 1\n",
"1 36 118 8.0 72 5 2"
]
}
],
"prompt_number": 7
},
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": [
"Question 3"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"How many observations (i.e. rows) are in this data frame?"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"len(df)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "pyout",
"prompt_number": 9,
"text": [
"153"
]
}
],
"prompt_number": 9
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"Question 4"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"Extract the last 2 rows of the data frame and print them to the console. What does the output look like?"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"df[-2:]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Ozone</th>\n",
" <th>Solar.R</th>\n",
" <th>Wind</th>\n",
" <th>Temp</th>\n",
" <th>Month</th>\n",
" <th>Day</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>151</th>\n",
" <td> 18</td>\n",
" <td> 131</td>\n",
" <td> 8.0</td>\n",
" <td> 76</td>\n",
" <td> 9</td>\n",
" <td> 29</td>\n",
" </tr>\n",
" <tr>\n",
" <th>152</th>\n",
" <td> 20</td>\n",
" <td> 223</td>\n",
" <td> 11.5</td>\n",
" <td> 68</td>\n",
" <td> 9</td>\n",
" <td> 30</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"output_type": "pyout",
"prompt_number": 14,
"text": [
" Ozone Solar.R Wind Temp Month Day\n",
"151 18 131 8.0 76 9 29\n",
"152 20 223 11.5 68 9 30"
]
}
],
"prompt_number": 14
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"Question 5"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"What is the value of Ozone in the 47th row?"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#Notice the differenct rowcount from python to R\n",
"df.Ozone[46]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "pyout",
"prompt_number": 19,
"text": [
"21.0"
]
}
],
"prompt_number": 19
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"Question 6"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"How many missing values are in the Ozone column of this data frame?"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#one option, there might be others for this\n",
"len(df.Ozone[df.Ozone.isnull()==True])"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "pyout",
"prompt_number": 38,
"text": [
"37"
]
}
],
"prompt_number": 38
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"Question 7"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"What is the mean of the Ozone column in this dataset? Exclude missing values (coded as NA) from this calculation."
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"df.Ozone.mean()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "pyout",
"prompt_number": 39,
"text": [
"42.129310344827587"
]
}
],
"prompt_number": 39
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"Question 8"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"Extract the subset of rows of the data frame where Ozone values are above 31 and Temp values are above 90. What is the mean of Solar.R in this subset?"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"df[\"Solar.R\"][(df.Temp>90) &(df.Ozone>31)].mean()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "pyout",
"prompt_number": 56,
"text": [
"212.80000000000001"
]
}
],
"prompt_number": 56
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"Question 9"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"What is the mean of \"Temp\" when \"Month\" is equal to 6?"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"df.Temp[df.Month==6].mean()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "pyout",
"prompt_number": 57,
"text": [
"79.099999999999994"
]
}
],
"prompt_number": 57
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"Question 10"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"What was the maximum ozone value in the month of May (i.e. Month = 5)?"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"df.Ozone[df.Month==5].max()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "pyout",
"prompt_number": 58,
"text": [
"115.0"
]
}
],
"prompt_number": 58
}
],
"metadata": {}
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment