Skip to content

Instantly share code, notes, and snippets.

@Tofull
Created July 8, 2017 08:02
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Tofull/1d24cbbc27daaf3404da3d6a641c1478 to your computer and use it in GitHub Desktop.
Save Tofull/1d24cbbc27daaf3404da3d6a641c1478 to your computer and use it in GitHub Desktop.
language detector
from textblob.classifiers import NaiveBayesClassifier
train = [
('amor', "spanish"),
("perro", "spanish"),
("playa", "spanish"),
("sal", "spanish"),
("oceano", "spanish"),
("love", "english"),
("dog", "english"),
("beach", "english"),
("salt", "english"),
("ocean", "english")
]
test = [
("ropa", "spanish"),
("comprar", "spanish"),
("camisa", "spanish"),
("agua", "spanish"),
("telefono", "spanish"),
("clothes", "english"),
("buy", "english"),
("shirt", "english"),
("water", "english"),
("telephone", "english")
]
def extractor(word):
'''Extract the last letter of a word as the only feature.'''
feats = {}
last_letter = word[-1]
feats["last_letter({0})".format(last_letter)] = True
return feats
lang_detector = NaiveBayesClassifier(train, feature_extractor=extractor)
print(lang_detector.accuracy(test))
print(lang_detector.show_informative_features(5))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment