You will explore the Stanford Open Policing Project dataset and analyze the impact of gender on police behavior. Practice cleaning messy data, creating visualizations, combining and reshaping datasets, and manipulating time series data. Analyzing Police Activity with pandas will give you valuable experience analyzing a dataset.
Lead by Kevin Markham Founder of Data School
Examine and clean the dataset, to make working with it a more efficient process. Fix data types, handling missing values, and dropping columns and rows while learning about the Stanford Open Policing Project dataset.
- Traffic stops by police officers download
- Preparing the data -- examine and clean
- Locating missing values
- Dropping a column
- Dropping rows
- Examining the data types
- Fixing a data type
- Creating a
DatetimeIndex
- Using datetime format
- Setting the index
import pandas as pd
ri = pd.read_csv('police.csv')
ri.isnull().sum()
ri.drop('county_name', axis='columns', inplace=True)
ri.dropna(subset=['stop_date', 'stop_time'], inplace=True)
ri.dtypes
apple['price'] = apple.price.astype('float')
apple.date.str.replace('/', '-')
combined = apple.date.str.cat(apple.time, sep=' ')
apple['date_and_time'] = pd.to_datetime(combined)
apple.set_index('date_and_time', inplace=True)
Does the gender of a driver have an impact on police behavior during a traffic stop? Explore that question while practicing filtering, grouping, method chaining, Boolean math, string methods, and more!
- Counting unique values
- Expressing counts as proportions
- Filtering by multiple conditions
- Correlation, not causation
- Analyze the relationship between gender and stop outcome
- Not going to draw any conclusions about causation
- Would need additional data and expertise
- Math with Boolean values
- Comparing groups using groupby
- Examining the search types
- Searching for a string
ri.stop_outcome.value_counts()
white = ri[ri.driver_race == 'White']
white.stop_outcome.value_counts(normalize=True)
np.mean([False, True, False, False])
ri.is_arrested.value_counts(normalize=True)
ri.groupby('district').is_arrested.mean()
ri.groupby(['district', 'driver_gender']).is_arrested.mean()
ri['inventory'] = ri.search_type.str.contains('Inventory', na=False)
ri.inventory.dtype
ri.inventory.sum()
ri.inventory.mean()
Are you more likely to get arrested at a certain time of day? Are drug-related stops on the rise? Answer these and other questions by analyzing the dataset visually, since plots can help you to understand trends in a way that examining the raw data cannot.
- Analyzing datetime data
- Accessing datetime attributes
- Calculating the monthly mean price
- Resampling the price
- Plotting price and volume
- Computing a frequency table
- Tally of how many times each combination of values occurs
- Analyzing an object column
- Mapping one set of values to another
- Creating a bar plot
- Ordering the bars
- Rotating the bars
apple.date_and_time.dt.month
apple.set_index('date_and_time', inplace=True)
apple.index
apple.groupby(apple.index.month).price.mean()
monthly_price = apple.price.resample('M').mean()
monthly_volume = apple.volume.resample('M').mean()
monthly = pd.concat([monthly_price, monthly_volume], axis='columns')
monthly.plot(subplots=True)
plt.show()
table = pd.crosstab(ri.driver_race, ri.driver_gender)
table.loc['Asian':'Hispanic']
table.plot()
table.plot(kind='bar')
table.plot(kind='bar', stacked=True)
plt.show()
mapping = {'up':True, 'down':False}
apple['is_up'] = apple.change.map(mapping)
search_rate = ri.groupby('violation').search_conducted.mean()
search_rate.sort_values().plot(kind='bar')
search_rate.sort_values().plot(kind='barh')
plt.show()
Use a second dataset to explore the impact of weather conditions on police behavior during traffic stops. Practice merging and reshaping datasets, assessing whether a data source is trustworthy, working with categorical data, and other advanced skills.