Skip to content

Instantly share code, notes, and snippets.

@tabrez
Created December 12, 2017 12:16
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save tabrez/08528e268b96a58dc1101a2872e26e53 to your computer and use it in GitHub Desktop.
Save tabrez/08528e268b96a58dc1101a2872e26e53 to your computer and use it in GitHub Desktop.
My_solutions.ipynb
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"metadata": {},
"cell_type": "markdown",
"source": "# Getting and Knowing your Data"
},
{
"metadata": {},
"cell_type": "markdown",
"source": "### Step 1. Import the necessary libraries"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "import pandas as pd",
"execution_count": 19,
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": "<class 'pandas.core.frame.DataFrame'>\nRangeIndex: 4622 entries, 0 to 4621\nData columns (total 5 columns):\norder_id 4622 non-null int64\nquantity 4622 non-null int64\nitem_name 4622 non-null object\nchoice_description 3376 non-null object\nitem_price 4622 non-null object\ndtypes: int64(2), object(3)\nmemory usage: 180.6+ KB\n"
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "### Step 2. Import the dataset from this [address](https://raw.githubusercontent.com/justmarkham/DAT8/master/data/chipotle.tsv). "
},
{
"metadata": {},
"cell_type": "markdown",
"source": "### Step 3. Assign it to a variable called chipo."
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "url = 'https://raw.githubusercontent.com/justmarkham/DAT8/master/data/chipotle.tsv'\nchipo = pd.read_csv(url, sep=r\"\\t\", engine='python')",
"execution_count": null,
"outputs": []
},
{
"metadata": {},
"cell_type": "markdown",
"source": "### Step 4. See the first 10 entries"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "chipo.head(10)",
"execution_count": 20,
"outputs": [
{
"data": {
"text/html": "<div>\n<style>\n .dataframe thead tr:only-child th {\n text-align: right;\n }\n\n .dataframe thead th {\n text-align: left;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>order_id</th>\n <th>quantity</th>\n <th>item_name</th>\n <th>choice_description</th>\n <th>item_price</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>1</td>\n <td>1</td>\n <td>Chips and Fresh Tomato Salsa</td>\n <td>NaN</td>\n <td>$2.39</td>\n </tr>\n <tr>\n <th>1</th>\n <td>1</td>\n <td>1</td>\n <td>Izze</td>\n <td>[Clementine]</td>\n <td>$3.39</td>\n </tr>\n <tr>\n <th>2</th>\n <td>1</td>\n <td>1</td>\n <td>Nantucket Nectar</td>\n <td>[Apple]</td>\n <td>$3.39</td>\n </tr>\n <tr>\n <th>3</th>\n <td>1</td>\n <td>1</td>\n <td>Chips and Tomatillo-Green Chili Salsa</td>\n <td>NaN</td>\n <td>$2.39</td>\n </tr>\n <tr>\n <th>4</th>\n <td>2</td>\n <td>2</td>\n <td>Chicken Bowl</td>\n <td>[Tomatillo-Red Chili Salsa (Hot), [Black Beans...</td>\n <td>$16.98</td>\n </tr>\n <tr>\n <th>5</th>\n <td>3</td>\n <td>1</td>\n <td>Chicken Bowl</td>\n <td>[Fresh Tomato Salsa (Mild), [Rice, Cheese, Sou...</td>\n <td>$10.98</td>\n </tr>\n <tr>\n <th>6</th>\n <td>3</td>\n <td>1</td>\n <td>Side of Chips</td>\n <td>NaN</td>\n <td>$1.69</td>\n </tr>\n <tr>\n <th>7</th>\n <td>4</td>\n <td>1</td>\n <td>Steak Burrito</td>\n <td>[Tomatillo Red Chili Salsa, [Fajita Vegetables...</td>\n <td>$11.75</td>\n </tr>\n <tr>\n <th>8</th>\n <td>4</td>\n <td>1</td>\n <td>Steak Soft Tacos</td>\n <td>[Tomatillo Green Chili Salsa, [Pinto Beans, Ch...</td>\n <td>$9.25</td>\n </tr>\n <tr>\n <th>9</th>\n <td>5</td>\n <td>1</td>\n <td>Steak Burrito</td>\n <td>[Fresh Tomato Salsa, [Rice, Black Beans, Pinto...</td>\n <td>$9.25</td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": " order_id quantity item_name \\\n0 1 1 Chips and Fresh Tomato Salsa \n1 1 1 Izze \n2 1 1 Nantucket Nectar \n3 1 1 Chips and Tomatillo-Green Chili Salsa \n4 2 2 Chicken Bowl \n5 3 1 Chicken Bowl \n6 3 1 Side of Chips \n7 4 1 Steak Burrito \n8 4 1 Steak Soft Tacos \n9 5 1 Steak Burrito \n\n choice_description item_price \n0 NaN $2.39 \n1 [Clementine] $3.39 \n2 [Apple] $3.39 \n3 NaN $2.39 \n4 [Tomatillo-Red Chili Salsa (Hot), [Black Beans... $16.98 \n5 [Fresh Tomato Salsa (Mild), [Rice, Cheese, Sou... $10.98 \n6 NaN $1.69 \n7 [Tomatillo Red Chili Salsa, [Fajita Vegetables... $11.75 \n8 [Tomatillo Green Chili Salsa, [Pinto Beans, Ch... $9.25 \n9 [Fresh Tomato Salsa, [Rice, Black Beans, Pinto... $9.25 "
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "### Step 5. What is the number of observations in the dataset?"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "len(chipo)",
"execution_count": 21,
"outputs": [
{
"data": {
"text/plain": "4622"
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "### Step 6. What is the number of columns in the dataset?"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "# chipo.size/len(chipo)\n# chipo.shape[1]\nr, c = chipo.shape\nc",
"execution_count": 24,
"outputs": [
{
"data": {
"text/plain": "5"
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "### Step 7. Print the name of all the columns."
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "chipo.columns",
"execution_count": 27,
"outputs": [
{
"data": {
"text/plain": "Index(['order_id', 'quantity', 'item_name', 'choice_description',\n 'item_price'],\n dtype='object')"
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "### Step 8. How is the dataset indexed?"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "chipo.index",
"execution_count": 28,
"outputs": [
{
"data": {
"text/plain": "RangeIndex(start=0, stop=4622, step=1)"
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "### Step 9. Which was the most ordered item?"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "ordered_quantities = chipo.groupby('item_name')['quantity'].sum().sort_values(ascending=False)\nordered_quantities.index[0]\n# ordered_quantities.head(1)",
"execution_count": 71,
"outputs": [
{
"data": {
"text/plain": "item_name\nChicken Bowl 761\nName: quantity, dtype: int64"
},
"execution_count": 71,
"metadata": {},
"output_type": "execute_result"
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "### Step 10. How many items were ordered?"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "chipo['item_name'].unique().size",
"execution_count": 47,
"outputs": [
{
"data": {
"text/plain": "50"
},
"execution_count": 47,
"metadata": {},
"output_type": "execute_result"
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "### Step 11. What was the most ordered item in the choice_description column?"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "ordered_choices = chipo.groupby('choice_description')['quantity'].sum().sort_values(ascending=False)\nordered_choices.index[0]",
"execution_count": 45,
"outputs": [
{
"data": {
"text/plain": "'[Diet Coke]'"
},
"execution_count": 45,
"metadata": {},
"output_type": "execute_result"
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "### Step 12. How many items were orderd in total?"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "chipo['quantity'].sum()",
"execution_count": 66,
"outputs": [
{
"data": {
"text/plain": "4972"
},
"execution_count": 66,
"metadata": {},
"output_type": "execute_result"
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "### Step 13. Turn the item price into a float"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "item_price_float = chipo['item_price'].map(lambda s: float(s[1:]))\nitem_price_float.head()",
"execution_count": 72,
"outputs": [
{
"data": {
"text/plain": "0 2.39\n1 3.39\n2 3.39\n3 2.39\n4 16.98\nName: item_price, dtype: float64"
},
"execution_count": 72,
"metadata": {},
"output_type": "execute_result"
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "### Step 14. How much was the revenue for the period in the dataset?"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "total_price_per_row = chipo['quantity'] * item_price_float\ntotal_revenue = total_price_per_row.sum()\n'$' + str(total_revenue)",
"execution_count": 70,
"outputs": [
{
"data": {
"text/plain": "'$39237.02'"
},
"execution_count": 70,
"metadata": {},
"output_type": "execute_result"
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "### Step 15. How many orders were made in the period?"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "# len(chipo.groupby('order_id'))\ntotal_orders = chipo['order_id'].unique().size\ntotal_orders",
"execution_count": 63,
"outputs": [
{
"data": {
"text/plain": "1834"
},
"execution_count": 63,
"metadata": {},
"output_type": "execute_result"
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "### Step 16. What is the average amount per order?"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "total_revenue / total_orders",
"execution_count": 64,
"outputs": [
{
"data": {
"text/plain": "21.394231188658669"
},
"execution_count": 64,
"metadata": {},
"output_type": "execute_result"
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "### Step 17. How many different items are sold?"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "chipo['item_name'].unique().size",
"execution_count": 65,
"outputs": [
{
"data": {
"text/plain": "50"
},
"execution_count": 65,
"metadata": {},
"output_type": "execute_result"
}
]
}
],
"metadata": {
"kernelspec": {
"name": "python3",
"display_name": "Python 3",
"language": "python"
},
"language_info": {
"name": "python",
"version": "3.6.3",
"mimetype": "text/x-python",
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"pygments_lexer": "ipython3",
"nbconvert_exporter": "python",
"file_extension": ".py"
},
"gist": {
"id": "",
"data": {
"description": "My_solutions.ipynb",
"public": true
}
}
},
"nbformat": 4,
"nbformat_minor": 1
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment