Created
March 30, 2015 21:28
-
-
Save joshstrupp/bb13938555cc94e22789 to your computer and use it in GitHub Desktop.
Pandas Homework
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Part 1 | |
Load the data (https://raw.githubusercontent.com/justmarkham/DAT5/master/data/auto_mpg.txt) | |
into a DataFrame. Try looking at the "head" of the file in the command line | |
to see how the file is delimited and how to load it. | |
Note: You do not need to turn in any command line code you may use. | |
''' | |
import pandas as pd | |
import matplotlib.pyplot as plt | |
import numpy as np | |
car_data = pd.read_csv('https://raw.githubusercontent.com/justmarkham/DAT5/master/data/auto_mpg.txt', delimiter='|') | |
#loads the auto.mpg.txt file into a var caled car_data and seperates columns using a pipe as the delimiter | |
''' | |
Part 2 | |
Get familiar with the data. Answer the following questions: | |
- What is the shape of the data? How many rows and columns are there? | |
- What variables are available? | |
- What are the ranges for the values in each numeric column? | |
- What is the average value for each column? Does that differ significantly | |
from the median? | |
''' | |
# examine the drinks data | |
car_data.shape #finds columns and rows (392, 9) | |
car_data.columns() #not quite sure what we mean by variables, so I'll list the columns available - mpg, cylinders, displacement, horsepower, weight, acceleration, model_year, and origin | |
car_data.mpg.max() - car_data.mpg.min() #37.6 | |
car_data.cylinders.max() - car_data.cylinders.min() #5 | |
car_data.displacement.max() - car_data.displacement.min() #387 | |
car_data.horsepower.max() - car_data.horsepower.min() #184 | |
car_data.weight.max() - car_data.weight.min() #3527 | |
car_data.acceleration.max() - car_data.acceleration.min() #16.8 | |
car_data.model_year.max() - car_data.model_year.min() #12 | |
car_data.mean() #calculates each column's average | |
car_data.median() #they are pretty close except for displacement | |
''' | |
Part 3 | |
Use the data to answer the following questions: | |
- Which 5 cars get the best gas mileage? | |
- Which 5 cars with more than 4 cylinders get the best gas mileage? | |
- Which 5 cars get the worst gas mileage? | |
- Which 5 cars with 4 or fewer cylinders get the worst gas mileage? | |
''' | |
car_data.sort_index(by='mpg', ascending=False).head() #returns 5 cars with highest mpg | |
car_data[car_data.cylinders > 4].sort_index(by='mpg', ascending=False).head() #returns the cars with the highest mpgs and have over 3 cylinders | |
car_data.sort_index(by='mpg').head() #returns same list of cars but starting at the lowest mpg (ascending order) | |
car_data[car_data.cylinders <= 4].sort_index(by='mpg').head() #reutnrs cars with 4 or fewer cylinders in ascending mpg order | |
''' | |
Part 4 | |
Use plots, groupby, aggregations, etc to explore the relationships | |
between mpg and the other variables. Which variables seem to have the greatest | |
effect on mpg? | |
Some examples of things you might want to look at are: | |
- What is the mean mpg for cars with each number of cylindres (i.e. 3 cylinders, | |
4 cylinders, 5 cylinders, etc)? | |
- Did mpg rise or fall over the years contained in this dataset? | |
- How does mpg change as weight increases or decreases? | |
''' | |
car_data.groupby('cylinders').mpg.mean() # more cylinders = less gas efficiency | |
car_data.groupby('horsepower').mpg.mean() #higher horsepower = less fuel efficiency | |
car_data.groupby('model_year').weight.mean() #cars got lighter as time went on | |
#WE NEVER LEARNED PLOTTING AGHHHH!!!! | |
#I had trouble using "git pull origin master" to update the questions. I'll ask for help in class. Thanks! | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment