Skip to content

Instantly share code, notes, and snippets.

@maZahaca
Created September 9, 2016 09:43
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save maZahaca/f595e6b6524f7d6e26539b60d8803ad3 to your computer and use it in GitHub Desktop.
Save maZahaca/f595e6b6524f7d6e26539b60d8803ad3 to your computer and use it in GitHub Desktop.
Post Office example
Post Office - High Littleton
Post Office Pilton Outreach Services
Town Street Post Office
post office St Thomas
@maZahaca
Copy link
Author

maZahaca commented Sep 9, 2016

@maZahaca
Copy link
Author

maZahaca commented Sep 9, 2016

Basically need to find out some algorithm or better library, to get such results:
Post Office: 16999
Post: 17934
Office: 16999
Tesco: 7300
...

Currently the main problem is detection of sentences.

Here is a code for doing this for words:

from textblob import TextBlob
import operator

title_file = open("names.txt", 'r')
blob = TextBlob(title_file.read())
list = sorted(blob.word_counts.items(), key=operator.itemgetter(1))
print list

Here are some restrictions:

  • only groups, which contains more than 1 name
  • only groups, which contains only textual data
  • it could contain sentences, from several words
  • only input should be - a list of names

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment