Skip to content

Instantly share code, notes, and snippets.

@masoud-saedi
Created July 31, 2023 00:38
Show Gist options
  • Save masoud-saedi/987b567d3cabe18e374725fdd0f2d442 to your computer and use it in GitHub Desktop.
Save masoud-saedi/987b567d3cabe18e374725fdd0f2d442 to your computer and use it in GitHub Desktop.
A Python script to clean and reprocess the crime data in Portugal from 2009 to 2017. The script handles missing values, renames columns, and transforms data types to make it suitable for analysis. The cleaned dataset can be used for visualization and exploration. Data source: Kaggle https://bit.ly/3QitzNW
import pandas as pd
# Load the Raw Data:
df = pd.read_csv('crimesportugal.csv', delimiter=';')
# Define Column Names:
column_names = {
'total': 'Total Crime',
'vdom': 'Domestic Violence',
'fur_veiculo': 'Vehicles Stolen',
'fur_resi': 'Residencial Burglaries',
'fur_edificio': 'Commercial Burglaries'
}
# Add year suffix to the new column names to make the column names unique for each year.
column_names_with_year = {'Ambito': 'Area'}
for year in range(2009, 2018):
for old_name, new_name in column_names.items():
column_names_with_year[f"{old_name}{year}"] = f"{new_name} {year}"
# Rename Columns:
df.rename(columns=column_names_with_year, inplace=True)
# Reshape the Dataset from Wide to Long Format to make it easier to analyze and visualize by crime type and year.
df = pd.melt(df, id_vars=['Area', 'Zona'],
var_name='Year_CrimeType', value_name='Count')
#Extract Crime Type and Year
df['CrimeType'] = df['Year_CrimeType'].str.extract('([a-zA-Z ]+)', expand=False)
df['Year'] = df['Year_CrimeType'].str.extract('(\d+)', expand=False)
#Convert Data Types:
df['Year'] = pd.to_numeric(df['Year'], errors='coerce').astype('Int64')
df['Count'] = pd.to_numeric(df['Count'], errors='coerce')
#Remove NaN Values:
df = df.dropna()
#Remove Unnecessary Column:
df = df.drop(columns=['Year_CrimeType'])
#Save to CSV File:
df.to_csv('E:\Datasets\crimes_portugal_long_final.csv', index=False)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment