Skip to content

Instantly share code, notes, and snippets.

@smzn
Last active January 4, 2024 03:46
Show Gist options
  • Save smzn/ea89412680313acbdb5779a0e9f4923f to your computer and use it in GitHub Desktop.
Save smzn/ea89412680313acbdb5779a0e9f4923f to your computer and use it in GitHub Desktop.
Calculating required statistics for each station
# Recalculating electric and classic bike counts as actual counts instead of rates
electric_bike_start_counts = df[df['rideable_type'] == 'electric_bike'].groupby('start_station_name')['ride_id'].count()
classic_bike_start_counts = df[df['rideable_type'] == 'classic_bike'].groupby('start_station_name')['ride_id'].count()
# Recalculating member and casual counts as actual counts instead of rates
member_counts = df[df['member_casual'] == 'member'].groupby('start_station_name')['ride_id'].count()
casual_counts = df[df['member_casual'] == 'casual'].groupby('start_station_name')['ride_id'].count()
# Total start and end counts for each station
start_counts = df.groupby('start_station_name')['ride_id'].count()
end_counts = df.groupby('end_station_name')['ride_id'].count()
# Average latitude and longitude for each start station
average_lat = df.groupby('start_station_name')['start_lat'].mean()
average_lng = df.groupby('start_station_name')['start_lng'].mean()
# Combining the recalculated counts into a single DataFrame
aggregated_data = pd.DataFrame({
'Electric Bike Count': electric_bike_start_counts,
'Classic Bike Count': classic_bike_start_counts,
'Start Count': start_counts,
'End Count': end_counts,
'Member Count': member_counts,
'Casual Count': casual_counts,
'Average Latitude': average_lat,
'Average Longitude': average_lng
})
# Filling NaN values with 0
aggregated_data = aggregated_data.fillna(0)
# Displaying the first few rows of the new aggregated data
aggregated_data
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment