This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import numpy | |
import pandas | |
import statsmodels.api as sm | |
def simple_heuristic(file_path): | |
''' | |
In this exercise, we will perform some rudimentary practices similar to those of | |
an actual data scientist. | |
Part of a data scientist's job is to use her or his intuition and insight to |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import numpy | |
import pandas | |
import statsmodels.api as sm | |
def complex_heuristic(file_path): | |
''' | |
You are given a list of Titantic passengers and their associated | |
information. More information about the data can be seen at the link below: | |
http://www.kaggle.com/c/titanic-gettingStarted/data |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import numpy | |
import pandas | |
import statsmodels.api as sm | |
def custom_heuristic(file_path): | |
''' | |
You are given a list of Titantic passengers and their associated | |
information. More information about the data can be seen at the link below: | |
http://www.kaggle.com/c/titanic-gettingStarted/data |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import pandas | |
import pandasql | |
def num_rainy_days(filename): | |
''' | |
This function should run a SQL query on a dataframe of | |
weather data. The SQL query should return one column and | |
one row - a count of the number of days in the dataframe where | |
the rain column is equal to 1 (i.e., the number of days it |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import pandas | |
import pandasql | |
def max_temp_aggregate_by_fog(filename): | |
''' | |
This function should run a SQL query on a dataframe of | |
weather data. The SQL query should return two columns and | |
two rows - whether it was foggy or not (0 or 1) and the max | |
maxtempi for that fog value (i.e., the maximum max temperature |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import pandas | |
import pandasql | |
def avg_weekend_temperature(filename): | |
''' | |
This function should run a SQL query on a dataframe of | |
weather data. The SQL query should return one column and | |
one row - the average meantempi on days that are a Saturday | |
or Sunday (i.e., the the average mean temperature on weekends). | |
The dataframe will be titled 'weather_data' and you can access |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import pandas | |
import pandasql | |
def avg_min_temperature(filename): | |
''' | |
This function should run a SQL query on a dataframe of | |
weather data. More specifically you want to find the average | |
minimum temperature (mintempi column of the weather dataframe) on | |
rainy days where the minimum temperature is greater than 55 degrees. | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import csv | |
def fix_turnstile_data(filenames): | |
''' | |
Filenames is a list of MTA Subway turnstile text files. A link to an example | |
MTA Subway turnstile text file can be seen at the URL below: | |
http://web.mta.info/developers/data/nyct/turnstile/turnstile_110507.txt | |
As you can see, there are numerous data points included in each row of the | |
a MTA Subway turnstile text file. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def create_master_turnstile_file(filenames, output_file): | |
''' | |
Write a function that takes the files in the list filenames, which all have the | |
columns 'C/A, UNIT, SCP, DATEn, TIMEn, DESCn, ENTRIESn, EXITSn', and consolidates | |
them into one file located at output_file. There should be ONE row with the column | |
headers, located at the top of the file. The input files do not have column header | |
rows of their own. | |
For example, if file_1 has: | |
'C/A, UNIT, SCP, DATEn, TIMEn, DESCn, ENTRIESn, EXITSn' |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import pandas | |
import pandasql | |
def filter_by_regular(filename): | |
''' | |
This function should read the csv file located at filename into a pandas dataframe, | |
and filter the dataframe to only rows where the 'DESCn' column has the value 'REGULAR'. | |
For example, if the pandas dataframe is as follows: | |
,C/A,UNIT,SCP,DATEn,TIMEn,DESCn,ENTRIESn,EXITSn |
OlderNewer