maZahaca/names1.txt Secret

Created September 9, 2016 09:43

Star 0 You must be signed in to star a gist
Fork 0 You must be signed in to fork a gist

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/maZahaca/f595e6b6524f7d6e26539b60d8803ad3.js"></script>
Save maZahaca/f595e6b6524f7d6e26539b60d8803ad3 to your computer and use it in GitHub Desktop.

Download ZIP

Post Office example

Raw

	Post Office - High Littleton
	Post Office Pilton Outreach Services
	Town Street Post Office
	post office St Thomas

Author

maZahaca commented Sep 9, 2016

Full example is here https://gist.github.com/maZahaca/a54046a4cc7ab27f9d06751b89aa7446

Author

maZahaca commented Sep 9, 2016 •

edited

Basically need to find out some algorithm or better library, to get such results:
Post Office: 16999
Post: 17934
Office: 16999
Tesco: 7300
...

Currently the main problem is detection of sentences.

Here is a code for doing this for words:

from textblob import TextBlob
import operator

title_file = open("names.txt", 'r')
blob = TextBlob(title_file.read())
list = sorted(blob.word_counts.items(), key=operator.itemgetter(1))
print list

Here are some restrictions:

only groups, which contains more than 1 name
only groups, which contains only textual data
it could contain sentences, from several words
only input should be - a list of names

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment