Skip to content

Instantly share code, notes, and snippets.

from pandas import *
from ggplot import *
def plot_weather_data(turnstile_weather):
'''
plot_weather_data is passed a dataframe called turnstile_weather.
Use turnstile_weather along with ggplot to make another data visualization
focused on the MTA and weather data we used in Project 3.
Make a type of visualization different than what you did in the previous exercise.
from pandas import *
from ggplot import *
def plot_weather_data(turnstile_weather):
'''
You are passed in a dataframe called turnstile_weather.
Use turnstile_weather along with ggplot to make a data visualization
focused on the MTA and weather data we used in assignment #3.
You should feel free to implement something that we discussed in class
(e.g., scatterplots, line plots, or histograms) or attempt to implement
import numpy as np
import scipy
import matplotlib.pyplot as plt
import sys
def compute_r_squared(data, predictions):
'''
In exercise 5, we calculated the R^2 value for you. But why don't you try and
and calculate the R^2 value yourself.
import numpy as np
import scipy
import matplotlib.pyplot as plt
def plot_residuals(turnstile_weather, predictions):
'''
Using the same methods that we used to plot a histogram of entries
per hour for our data, why don't you make a histogram of the residuals
(that is, the difference between the original hourly entry data and the predicted values).
Try different binwidths for your histogram.
import numpy as np
import pandas
import statsmodels.api as sm
"""
In this question, you need to:
1) implement the linear_regression() procedure
2) Select features (in the predictions procedure) and make predictions.
"""
Yes
From the results in step 3 we can see that the mean of with_rain and without_rain are quite close. And the P-value of the scipy's Mann-Whitney implementation is small, less than 5%.
import numpy as np
import scipy
import scipy.stats
import pandas
def mann_whitney_plus_means(turnstile_weather):
'''
This function will consume the turnstile_weather dataframe containing
our final turnstile weather data.
No
No. Because the data size of rain and not rain are not the same.
import numpy as np
import pandas
import matplotlib.pyplot as plt
def entries_histogram(turnstile_weather):
'''
Before we perform any analysis, it might be useful to take a
look at the data we're hoping to analyze. More specifically, let's
examine the hourly entries in our NYC subway data and determine what
distribution the data follows. This data is stored in a dataframe
import datetime
def reformat_subway_dates(date):
'''
The dates in our subway data are formatted in the format month-day-year.
The dates in our weather underground data are formatted year-month-day.
In order to join these two data sets together, we'll want the dates formatted
the same way. Write a function that takes as its input a date in the MTA Subway
data format, and returns a date in the weather underground format.