Skip to content

Instantly share code, notes, and snippets.

@dottyz
Created May 2, 2019 18:31
Show Gist options
  • Save dottyz/bb4703887456e4606d0a150671430a72 to your computer and use it in GitHub Desktop.
Save dottyz/bb4703887456e4606d0a150671430a72 to your computer and use it in GitHub Desktop.
# Separate the stations without station IDs
no_ids = stations[stations['station_id'].isnull()]
for idx, miss in no_ids.iterrows():
max_score = 0
# Compare the similarity of the station without ID to each station in the API data
for i, exist in bikeshare_stations[['station_id', 'name']].iterrows():
score = fuzz.ratio(miss['name'], exist['name'])
if score > 80 and score > max_score:
max_score = score
no_ids.at[idx, 'station_id'] = exist['station_id']
# Warn if the station was not able to be matched
if max_score <= 80:
print('WARN: {0} station could not be matched to an existing station'.format(miss['name']))
# Remove all stations that were not matched
no_ids = no_ids.dropna()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment