Skip to content

Instantly share code, notes, and snippets.

@orico
Last active April 14, 2019 08:36
Show Gist options
  • Save orico/bf711a3858eea3fe5ffa4303d2b1d56a to your computer and use it in GitHub Desktop.
Save orico/bf711a3858eea3fe5ffa4303d2b1d56a to your computer and use it in GitHub Desktop.
Spacy GPE test
import pandas as pd
import numpy as np
import spacy
import en_core_web_sm
path_to_data = './data/'
spacy.prefer_gpu()
nlp = en_core_web_sm.load()
cities = pd.read_csv(path_to_data + 'us_cities_states_counties.csv')
cities['City alias'] = cities['City alias'].apply(lambda x: str(x))
# GPE = Countries, cities, states.
count = 0
passed = 0
for i, city in enumerate(cities['City alias'].values):
try:
doc = nlp(city)
for X in doc.ents:
if X.label_=='GPE':
count+=1
except:
passed +=1
pass
if i% 5000 == 0: print (i, count, passed)
print(f'Spacy knows {count} out of {cities.shape[0]}')
print('couldnt process:', passed)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment