Skip to content

Instantly share code, notes, and snippets.

@osori
Last active October 31, 2017 00:54
Show Gist options
  • Save osori/a917b7467496d4e61a3019c55a18898a to your computer and use it in GitHub Desktop.
Save osori/a917b7467496d4e61a3019c55a18898a to your computer and use it in GitHub Desktop.
This python script can analyze n-grams from word or phoneme level. | 음절/어절 단에서 n-gram을 분석해주는 파이썬 스크립트입니다.
#!/usr/bin/env python3
# -*- coding:utf-8 -*-]
sample_text = "신은 다시 일어서는 법을 가르치기 위해 넘어뜨린다고 나는 믿는다."
def word_ngram(sentence, num_gram):
ngrams = []
text = list(sentence) # split the sentence into an array of characters
ngrams = [text[x:x+num_gram] for x in range(0, len(text))]
return ngrams
def phoneme_ngram(sentence, num_gram):
ngrams = []
text = sentence.split(' ')
ngrams = [text[x:x+num_gram] for x in range(0, len(text))]
return ngrams
print(word_ngram(sample_text, 2))
print(phoneme_ngram(sample_text, 3))
@tempo13
Copy link

tempo13 commented Oct 31, 2017

안녕하세요! 좋은 코드 감사합니다!
코드에 관련해서 궁금한 점이 있는데요, word_ngram에서 ngrams = [text[x:x+num_gram] for x in range(0, len(text))] 에서
text의 인덱스에서 num_gram을 왜 더하는지 궁금합니다!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment