Skip to content

Instantly share code, notes, and snippets.

@smzn
Created January 4, 2024 02:46
Show Gist options
  • Save smzn/1ade1b6bb8530fbbf4ac1db8600ba395 to your computer and use it in GitHub Desktop.
Save smzn/1ade1b6bb8530fbbf4ac1db8600ba395 to your computer and use it in GitHub Desktop.
Adjusting to select stations that make up the top 20% of all stations by count
# Adjusting to select stations that make up the top 20% of all stations by count
# Sorting the station counts
sorted_start_stations = df['start_station_name'].value_counts().sort_values(ascending=False)
sorted_end_stations = df['end_station_name'].value_counts().sort_values(ascending=False)
# Calculating the cumulative percentage
cumulative_percentage_start = sorted_start_stations.cumsum() / sorted_start_stations.sum()
cumulative_percentage_end = sorted_end_stations.cumsum() / sorted_end_stations.sum()
# Filtering for top 20%
top_20_start_stations = sorted_start_stations[cumulative_percentage_start <= 0.20]
top_20_end_stations = sorted_end_stations[cumulative_percentage_end <= 0.20]
plt.figure(figsize=(15, 10))
# Plotting for start stations
plt.subplot(2, 1, 1)
top_20_start_stations.plot(kind='bar', color='skyblue')
plt.title('Top 20% of Start Stations by Count')
plt.xlabel('Station Name')
plt.ylabel('Frequency')
plt.xticks(rotation=90)
plt.grid(axis='y')
# Plotting for end stations
plt.subplot(2, 1, 2)
top_20_end_stations.plot(kind='bar', color='green')
plt.title('Top 20% of End Stations by Count')
plt.xlabel('Station Name')
plt.ylabel('Frequency')
plt.xticks(rotation=90)
plt.grid(axis='y')
plt.tight_layout()
plt.show()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment