duplicates = multiple editions
A Classical Introduction to Modern Number Theory, Kenneth Ireland Michael Rosen
A Classical Introduction to Modern Number Theory, Kenneth Ireland Michael Rosen
Here are the areas I've been researching, some things I've read and some open source packages...
Nearly all text processing starts by transforming text into vectors: http://en.wikipedia.org/wiki/Vector_space_model
Often it uses transforms such as TFIDF to normalise the data and control for outliers (words that are too frequent or too rare confuse the algorithms): http://en.wikipedia.org/wiki/Tf%E2%80%93idf
Collocations is a technique to detect when two or more words occur more commonly together than separately (e.g. "wishy-washy" in English) - I use this to group words into n-gram tokens because many NLP techniques consider each word as if it's independent of all the others in a document, ignoring order: http://matpalm.com/blog/2011/10/22/collocations_1/
#!/usr/bin/env python | |
""" | |
Use pip to get a list of local packages to check against one or more package | |
indexes for updated versions. | |
""" | |
import pip | |
import sys, xmlrpclib | |
from cStringIO import StringIO | |
from distutils.version import StrictVersion, LooseVersion |