Mengyuz

## Problem set 4-2 - Make Another Visualization
from pandas import *
from ggplot import *

def plot_weather_data(turnstile_weather):
    '''
    plot_weather_data is passed a dataframe called turnstile_weather.
    Use turnstile_weather along with ggplot to make another data visualization
    focused on the MTA and weather data we used in Project 3.

    Make a type of visualization different than what you did in the previous exercise.

## Problem set 4-Exercise - Visualization 1
from pandas import *
from ggplot import *

def plot_weather_data(turnstile_weather):
    '''
    You are passed in a dataframe called turnstile_weather.
    Use turnstile_weather along with ggplot to make a data visualization
    focused on the MTA and weather data we used in assignment #3.
    You should feel free to implement something that we discussed in class
    (e.g., scatterplots, line plots, or histograms) or attempt to implement

## Problem set 3-7 - Compute R^2
import numpy as np
import scipy
import matplotlib.pyplot as plt
import sys

def compute_r_squared(data, predictions):
    '''
    In exercise 5, we calculated the R^2 value for you. But why don't you try and
    and calculate the R^2 value yourself.


## Problem set 3- 6 - Plotting Residuals
import numpy as np
import scipy
import matplotlib.pyplot as plt

def plot_residuals(turnstile_weather, predictions):
    '''
    Using the same methods that we used to plot a histogram of entries
    per hour for our data, why don't you make a histogram of the residuals
    (that is, the difference between the original hourly entry data and the predicted values).
    Try different binwidths for your histogram.

## Problem set 3-5 - Linear Regression
import numpy as np
import pandas
import statsmodels.api as sm

"""
In this question, you need to:
1) implement the linear_regression() procedure
2) Select features (in the predictions procedure) and make predictions.

"""

## Problem set 3-4 - Ridership on Rainy vs. Nonrainy Days
Yes

From the results in step 3 we can see that the mean of with_rain and without_rain are quite close. And the P-value of the scipy's Mann-Whitney implementation is small, less than 5%.

## Problem set 3-3 - Mann-Whitney U-Test
import numpy as np
import scipy
import scipy.stats
import pandas

def mann_whitney_plus_means(turnstile_weather):
    '''
    This function will consume the turnstile_weather dataframe containing
    our final turnstile weather data.


## Problem set 3-2 - Welch's t-Test?N
No


No. Because the data size of rain and not rain are not the same.

## Problem set 3- 1 - Exploratory Data Analysis
import numpy as np
import pandas
import matplotlib.pyplot as plt

def entries_histogram(turnstile_weather):
    '''
    Before we perform any analysis, it might be useful to take a
    look at the data we're hoping to analyze. More specifically, let's
    examine the hourly entries in our NYC subway data and determine what
    distribution the data follows. This data is stored in a dataframe

## Problem set 2-11 - Reformat Subway Dates
import datetime

def reformat_subway_dates(date):
    '''
    The dates in our subway data are formatted in the format month-day-year.
    The dates in our weather underground data are formatted year-month-day.

    In order to join these two data sets together, we'll want the dates formatted
    the same way.  Write a function that takes as its input a date in the MTA Subway
    data format, and returns a date in the weather underground format.
	from pandas import *
	from ggplot import *

	def plot_weather_data(turnstile_weather):
	'''
	plot_weather_data is passed a dataframe called turnstile_weather.
	Use turnstile_weather along with ggplot to make another data visualization
	focused on the MTA and weather data we used in Project 3.

	Make a type of visualization different than what you did in the previous exercise.
	import numpy as np
	import scipy
	import matplotlib.pyplot as plt
	import sys

	def compute_r_squared(data, predictions):
	'''
	In exercise 5, we calculated the R^2 value for you. But why don't you try and
	and calculate the R^2 value yourself.
	import numpy as np
	import pandas
	import statsmodels.api as sm

	"""
	In this question, you need to:
	1) implement the linear_regression() procedure
	2) Select features (in the predictions procedure) and make predictions.

	"""
	Yes

	From the results in step 3 we can see that the mean of with_rain and without_rain are quite close. And the P-value of the scipy's Mann-Whitney implementation is small, less than 5%.
	import numpy as np
	import scipy
	import scipy.stats
	import pandas

	def mann_whitney_plus_means(turnstile_weather):
	'''
	This function will consume the turnstile_weather dataframe containing
	our final turnstile weather data.
	No


	No. Because the data size of rain and not rain are not the same.
	import datetime

	def reformat_subway_dates(date):
	'''
	The dates in our subway data are formatted in the format month-day-year.
	The dates in our weather underground data are formatted year-month-day.

	In order to join these two data sets together, we'll want the dates formatted
	the same way. Write a function that takes as its input a date in the MTA Subway
	data format, and returns a date in the weather underground format.