Skip to content

Instantly share code, notes, and snippets.

What would you like to do?
Pandas Homework
Part 1
Load the data (
into a DataFrame. Try looking at the "head" of the file in the command line
to see how the file is delimited and how to load it.
Note: You do not need to turn in any command line code you may use.
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
car_data = pd.read_csv('', delimiter='|')
#loads the auto.mpg.txt file into a var caled car_data and seperates columns using a pipe as the delimiter
Part 2
Get familiar with the data. Answer the following questions:
- What is the shape of the data? How many rows and columns are there?
- What variables are available?
- What are the ranges for the values in each numeric column?
- What is the average value for each column? Does that differ significantly
from the median?
# examine the drinks data
car_data.shape #finds columns and rows (392, 9)
car_data.columns() #not quite sure what we mean by variables, so I'll list the columns available - mpg, cylinders, displacement, horsepower, weight, acceleration, model_year, and origin
car_data.mpg.max() - car_data.mpg.min() #37.6
car_data.cylinders.max() - car_data.cylinders.min() #5
car_data.displacement.max() - car_data.displacement.min() #387
car_data.horsepower.max() - car_data.horsepower.min() #184
car_data.weight.max() - car_data.weight.min() #3527
car_data.acceleration.max() - car_data.acceleration.min() #16.8
car_data.model_year.max() - car_data.model_year.min() #12
car_data.mean() #calculates each column's average
car_data.median() #they are pretty close except for displacement
Part 3
Use the data to answer the following questions:
- Which 5 cars get the best gas mileage?
- Which 5 cars with more than 4 cylinders get the best gas mileage?
- Which 5 cars get the worst gas mileage?
- Which 5 cars with 4 or fewer cylinders get the worst gas mileage?
car_data.sort_index(by='mpg', ascending=False).head() #returns 5 cars with highest mpg
car_data[car_data.cylinders > 4].sort_index(by='mpg', ascending=False).head() #returns the cars with the highest mpgs and have over 3 cylinders
car_data.sort_index(by='mpg').head() #returns same list of cars but starting at the lowest mpg (ascending order)
car_data[car_data.cylinders <= 4].sort_index(by='mpg').head() #reutnrs cars with 4 or fewer cylinders in ascending mpg order
Part 4
Use plots, groupby, aggregations, etc to explore the relationships
between mpg and the other variables. Which variables seem to have the greatest
effect on mpg?
Some examples of things you might want to look at are:
- What is the mean mpg for cars with each number of cylindres (i.e. 3 cylinders,
4 cylinders, 5 cylinders, etc)?
- Did mpg rise or fall over the years contained in this dataset?
- How does mpg change as weight increases or decreases?
car_data.groupby('cylinders').mpg.mean() # more cylinders = less gas efficiency
car_data.groupby('horsepower').mpg.mean() #higher horsepower = less fuel efficiency
car_data.groupby('model_year').weight.mean() #cars got lighter as time went on
#I had trouble using "git pull origin master" to update the questions. I'll ask for help in class. Thanks!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment