Skip to content

Instantly share code, notes, and snippets.

@misho-kr
Last active December 24, 2020 10:01
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save misho-kr/0b101da722d818d352112a6a98f48cee to your computer and use it in GitHub Desktop.
Save misho-kr/0b101da722d818d352112a6a98f48cee to your computer and use it in GitHub Desktop.
Summary of "Analyzing Police Activity with pandas" course on Datacamp (https://gist.github.com/misho-kr/873ddcc2fc89f1c96414de9e0a58e0fe)

You will explore the Stanford Open Policing Project dataset and analyze the impact of gender on police behavior. Practice cleaning messy data, creating visualizations, combining and reshaping datasets, and manipulating time series data. Analyzing Police Activity with pandas will give you valuable experience analyzing a dataset.

Lead by Kevin Markham Founder of Data School

Preparing the data for analysis

Examine and clean the dataset, to make working with it a more efficient process. Fix data types, handling missing values, and dropping columns and rows while learning about the Stanford Open Policing Project dataset.

  • Traffic stops by police officers download
  • Preparing the data -- examine and clean
    • Locating missing values
    • Dropping a column
    • Dropping rows
  • Examining the data types
    • Fixing a data type
  • Creating a DatetimeIndex
    • Using datetime format
    • Setting the index
import pandas as pd

ri = pd.read_csv('police.csv')
ri.isnull().sum()

ri.drop('county_name', axis='columns', inplace=True)
ri.dropna(subset=['stop_date', 'stop_time'], inplace=True)

ri.dtypes
apple['price'] = apple.price.astype('float')

apple.date.str.replace('/', '-')
combined = apple.date.str.cat(apple.time, sep=' ')
apple['date_and_time'] = pd.to_datetime(combined)
apple.set_index('date_and_time', inplace=True)

Exploring the relationship between gender and policing

Does the gender of a driver have an impact on police behavior during a traffic stop? Explore that question while practicing filtering, grouping, method chaining, Boolean math, string methods, and more!

  • Counting unique values
    • Expressing counts as proportions
  • Filtering by multiple conditions
  • Correlation, not causation
    • Analyze the relationship between gender and stop outcome
    • Not going to draw any conclusions about causation
      • Would need additional data and expertise
  • Math with Boolean values
  • Comparing groups using groupby
  • Examining the search types
    • Searching for a string
ri.stop_outcome.value_counts()

white = ri[ri.driver_race == 'White']
white.stop_outcome.value_counts(normalize=True)

np.mean([False, True, False, False])
ri.is_arrested.value_counts(normalize=True)

ri.groupby('district').is_arrested.mean()
ri.groupby(['district', 'driver_gender']).is_arrested.mean()

ri['inventory'] = ri.search_type.str.contains('Inventory', na=False)
ri.inventory.dtype
ri.inventory.sum()
ri.inventory.mean()

Visual exploratory data analysis

Are you more likely to get arrested at a certain time of day? Are drug-related stops on the rise? Answer these and other questions by analyzing the dataset visually, since plots can help you to understand trends in a way that examining the raw data cannot.

  • Analyzing datetime data
    • Accessing datetime attributes
    • Calculating the monthly mean price
  • Resampling the price
    • Plotting price and volume
  • Computing a frequency table
    • Tally of how many times each combination of values occurs
  • Analyzing an object column
    • Mapping one set of values to another
  • Creating a bar plot
    • Ordering the bars
    • Rotating the bars
apple.date_and_time.dt.month

apple.set_index('date_and_time', inplace=True)
apple.index

apple.groupby(apple.index.month).price.mean()

monthly_price = apple.price.resample('M').mean()
monthly_volume = apple.volume.resample('M').mean()
monthly = pd.concat([monthly_price, monthly_volume], axis='columns')

monthly.plot(subplots=True)
plt.show()
table = pd.crosstab(ri.driver_race, ri.driver_gender)
table.loc['Asian':'Hispanic']

table.plot()
table.plot(kind='bar')
table.plot(kind='bar', stacked=True)
plt.show()

mapping = {'up':True, 'down':False}
apple['is_up'] = apple.change.map(mapping)

search_rate = ri.groupby('violation').search_conducted.mean()
search_rate.sort_values().plot(kind='bar')
search_rate.sort_values().plot(kind='barh')
plt.show()

Analyzing the effect of weather on policing

Use a second dataset to explore the impact of weather conditions on police behavior during traffic stops. Practice merging and reshaping datasets, assessing whether a data source is trustworthy, working with categorical data, and other advanced skills.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment