Created
December 12, 2017 12:16
-
-
Save tabrez/08528e268b96a58dc1101a2872e26e53 to your computer and use it in GitHub Desktop.
My_solutions.ipynb
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "# Getting and Knowing your Data" | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "### Step 1. Import the necessary libraries" | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "import pandas as pd", | |
"execution_count": 19, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": "<class 'pandas.core.frame.DataFrame'>\nRangeIndex: 4622 entries, 0 to 4621\nData columns (total 5 columns):\norder_id 4622 non-null int64\nquantity 4622 non-null int64\nitem_name 4622 non-null object\nchoice_description 3376 non-null object\nitem_price 4622 non-null object\ndtypes: int64(2), object(3)\nmemory usage: 180.6+ KB\n" | |
} | |
] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "### Step 2. Import the dataset from this [address](https://raw.githubusercontent.com/justmarkham/DAT8/master/data/chipotle.tsv). " | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "### Step 3. Assign it to a variable called chipo." | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "url = 'https://raw.githubusercontent.com/justmarkham/DAT8/master/data/chipotle.tsv'\nchipo = pd.read_csv(url, sep=r\"\\t\", engine='python')", | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "### Step 4. See the first 10 entries" | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "chipo.head(10)", | |
"execution_count": 20, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": "<div>\n<style>\n .dataframe thead tr:only-child th {\n text-align: right;\n }\n\n .dataframe thead th {\n text-align: left;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>order_id</th>\n <th>quantity</th>\n <th>item_name</th>\n <th>choice_description</th>\n <th>item_price</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>1</td>\n <td>1</td>\n <td>Chips and Fresh Tomato Salsa</td>\n <td>NaN</td>\n <td>$2.39</td>\n </tr>\n <tr>\n <th>1</th>\n <td>1</td>\n <td>1</td>\n <td>Izze</td>\n <td>[Clementine]</td>\n <td>$3.39</td>\n </tr>\n <tr>\n <th>2</th>\n <td>1</td>\n <td>1</td>\n <td>Nantucket Nectar</td>\n <td>[Apple]</td>\n <td>$3.39</td>\n </tr>\n <tr>\n <th>3</th>\n <td>1</td>\n <td>1</td>\n <td>Chips and Tomatillo-Green Chili Salsa</td>\n <td>NaN</td>\n <td>$2.39</td>\n </tr>\n <tr>\n <th>4</th>\n <td>2</td>\n <td>2</td>\n <td>Chicken Bowl</td>\n <td>[Tomatillo-Red Chili Salsa (Hot), [Black Beans...</td>\n <td>$16.98</td>\n </tr>\n <tr>\n <th>5</th>\n <td>3</td>\n <td>1</td>\n <td>Chicken Bowl</td>\n <td>[Fresh Tomato Salsa (Mild), [Rice, Cheese, Sou...</td>\n <td>$10.98</td>\n </tr>\n <tr>\n <th>6</th>\n <td>3</td>\n <td>1</td>\n <td>Side of Chips</td>\n <td>NaN</td>\n <td>$1.69</td>\n </tr>\n <tr>\n <th>7</th>\n <td>4</td>\n <td>1</td>\n <td>Steak Burrito</td>\n <td>[Tomatillo Red Chili Salsa, [Fajita Vegetables...</td>\n <td>$11.75</td>\n </tr>\n <tr>\n <th>8</th>\n <td>4</td>\n <td>1</td>\n <td>Steak Soft Tacos</td>\n <td>[Tomatillo Green Chili Salsa, [Pinto Beans, Ch...</td>\n <td>$9.25</td>\n </tr>\n <tr>\n <th>9</th>\n <td>5</td>\n <td>1</td>\n <td>Steak Burrito</td>\n <td>[Fresh Tomato Salsa, [Rice, Black Beans, Pinto...</td>\n <td>$9.25</td>\n </tr>\n </tbody>\n</table>\n</div>", | |
"text/plain": " order_id quantity item_name \\\n0 1 1 Chips and Fresh Tomato Salsa \n1 1 1 Izze \n2 1 1 Nantucket Nectar \n3 1 1 Chips and Tomatillo-Green Chili Salsa \n4 2 2 Chicken Bowl \n5 3 1 Chicken Bowl \n6 3 1 Side of Chips \n7 4 1 Steak Burrito \n8 4 1 Steak Soft Tacos \n9 5 1 Steak Burrito \n\n choice_description item_price \n0 NaN $2.39 \n1 [Clementine] $3.39 \n2 [Apple] $3.39 \n3 NaN $2.39 \n4 [Tomatillo-Red Chili Salsa (Hot), [Black Beans... $16.98 \n5 [Fresh Tomato Salsa (Mild), [Rice, Cheese, Sou... $10.98 \n6 NaN $1.69 \n7 [Tomatillo Red Chili Salsa, [Fajita Vegetables... $11.75 \n8 [Tomatillo Green Chili Salsa, [Pinto Beans, Ch... $9.25 \n9 [Fresh Tomato Salsa, [Rice, Black Beans, Pinto... $9.25 " | |
}, | |
"execution_count": 20, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "### Step 5. What is the number of observations in the dataset?" | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "len(chipo)", | |
"execution_count": 21, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": "4622" | |
}, | |
"execution_count": 21, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "### Step 6. What is the number of columns in the dataset?" | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "# chipo.size/len(chipo)\n# chipo.shape[1]\nr, c = chipo.shape\nc", | |
"execution_count": 24, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": "5" | |
}, | |
"execution_count": 24, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "### Step 7. Print the name of all the columns." | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "chipo.columns", | |
"execution_count": 27, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": "Index(['order_id', 'quantity', 'item_name', 'choice_description',\n 'item_price'],\n dtype='object')" | |
}, | |
"execution_count": 27, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "### Step 8. How is the dataset indexed?" | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "chipo.index", | |
"execution_count": 28, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": "RangeIndex(start=0, stop=4622, step=1)" | |
}, | |
"execution_count": 28, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "### Step 9. Which was the most ordered item?" | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "ordered_quantities = chipo.groupby('item_name')['quantity'].sum().sort_values(ascending=False)\nordered_quantities.index[0]\n# ordered_quantities.head(1)", | |
"execution_count": 71, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": "item_name\nChicken Bowl 761\nName: quantity, dtype: int64" | |
}, | |
"execution_count": 71, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "### Step 10. How many items were ordered?" | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "chipo['item_name'].unique().size", | |
"execution_count": 47, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": "50" | |
}, | |
"execution_count": 47, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "### Step 11. What was the most ordered item in the choice_description column?" | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "ordered_choices = chipo.groupby('choice_description')['quantity'].sum().sort_values(ascending=False)\nordered_choices.index[0]", | |
"execution_count": 45, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": "'[Diet Coke]'" | |
}, | |
"execution_count": 45, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "### Step 12. How many items were orderd in total?" | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "chipo['quantity'].sum()", | |
"execution_count": 66, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": "4972" | |
}, | |
"execution_count": 66, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "### Step 13. Turn the item price into a float" | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "item_price_float = chipo['item_price'].map(lambda s: float(s[1:]))\nitem_price_float.head()", | |
"execution_count": 72, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": "0 2.39\n1 3.39\n2 3.39\n3 2.39\n4 16.98\nName: item_price, dtype: float64" | |
}, | |
"execution_count": 72, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "### Step 14. How much was the revenue for the period in the dataset?" | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "total_price_per_row = chipo['quantity'] * item_price_float\ntotal_revenue = total_price_per_row.sum()\n'$' + str(total_revenue)", | |
"execution_count": 70, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": "'$39237.02'" | |
}, | |
"execution_count": 70, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "### Step 15. How many orders were made in the period?" | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "# len(chipo.groupby('order_id'))\ntotal_orders = chipo['order_id'].unique().size\ntotal_orders", | |
"execution_count": 63, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": "1834" | |
}, | |
"execution_count": 63, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "### Step 16. What is the average amount per order?" | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "total_revenue / total_orders", | |
"execution_count": 64, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": "21.394231188658669" | |
}, | |
"execution_count": 64, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "### Step 17. How many different items are sold?" | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "chipo['item_name'].unique().size", | |
"execution_count": 65, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": "50" | |
}, | |
"execution_count": 65, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"name": "python3", | |
"display_name": "Python 3", | |
"language": "python" | |
}, | |
"language_info": { | |
"name": "python", | |
"version": "3.6.3", | |
"mimetype": "text/x-python", | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"pygments_lexer": "ipython3", | |
"nbconvert_exporter": "python", | |
"file_extension": ".py" | |
}, | |
"gist": { | |
"id": "", | |
"data": { | |
"description": "My_solutions.ipynb", | |
"public": true | |
} | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 1 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment