Skip to content

Instantly share code, notes, and snippets.

@achmedzhanov
Forked from blumonkey/hmm-example.py
Last active January 22, 2020 11:01
Show Gist options
  • Save achmedzhanov/caadfb7b76568b28d40610a4a775f825 to your computer and use it in GitHub Desktop.
Save achmedzhanov/caadfb7b76568b28d40610a4a775f825 to your computer and use it in GitHub Desktop.
Python Code to train a Hidden Markov Model, using NLTK
__author__ = 'ssbushi'
# Import the toolkit and tags
import nltk
from nltk.corpus import treebank
# Train data - pretagged
train_data = treebank.tagged_sents()[:3000]
test_data = treebank.tagged_sents()[3000:]
print(train_data[0])
# Import HMM module
from nltk.tag import hmm
# Setup the tagger
tagger = nltk.HiddenMarkovModelTagger.train(train_data)
print(tagger)
# Prints the basic data about the tagger
print(tagger.tag("Today is a good day .".split()))
print(tagger.evaluate(test_data))
"""
Output in order (Notice some tags are wrong :/):
[('Today', u'NN'), ('is', u'VBZ'), ('a', u'DT'), ('good', u'JJ'), ('day', u'NN'), ('.', u'.')]
0.36844377293330455
"""
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment