Skip to content

Instantly share code, notes, and snippets.

@anilkay
Created October 28, 2018 19:21
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save anilkay/1a2371e8b30ee892fdb6fc1f848b2e89 to your computer and use it in GitHub Desktop.
Save anilkay/1a2371e8b30ee892fdb6fc1f848b2e89 to your computer and use it in GitHub Desktop.
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Sun Oct 28 21:41:09 2018
@author: anilkaynar
"""
import nltk
from nltk import ngrams
n=2
tex="Two brothers and a husband and wife were among those killed. Six people were injured, including four policemen."
tex+="The suspect, Robert Bowers, 46, is in custody and faces 29 criminal counts in what is thought to be the worst anti-Semitic attack in recent US history"
#texsplit=tex.split() work just fine
mex="Two brothers and a husband and wife were among those killed. Six people were injured, including four policemen."
mex+="The suspect, Robert Bowers, 46, is in custody and faces 29 criminal counts in what is thought to be the worst anti-Semitic attack in recent US history."
texsplit=nltk.word_tokenize(tex)
hepsi=list(ngrams(texsplit,n))
hepsi+=list(ngrams(nltk.word_tokenize(mex),2))
print(hepsi)
freqdist=nltk.FreqDist(hepsi)
print(freqdist.most_common(50)
#Bu şekilde kullanılabilir.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment