Last active
October 31, 2017 00:54
-
-
Save osori/a917b7467496d4e61a3019c55a18898a to your computer and use it in GitHub Desktop.
This python script can analyze n-grams from word or phoneme level. | 음절/어절 단에서 n-gram을 분석해주는 파이썬 스크립트입니다.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env python3 | |
# -*- coding:utf-8 -*-] | |
sample_text = "신은 다시 일어서는 법을 가르치기 위해 넘어뜨린다고 나는 믿는다." | |
def word_ngram(sentence, num_gram): | |
ngrams = [] | |
text = list(sentence) # split the sentence into an array of characters | |
ngrams = [text[x:x+num_gram] for x in range(0, len(text))] | |
return ngrams | |
def phoneme_ngram(sentence, num_gram): | |
ngrams = [] | |
text = sentence.split(' ') | |
ngrams = [text[x:x+num_gram] for x in range(0, len(text))] | |
return ngrams | |
print(word_ngram(sample_text, 2)) | |
print(phoneme_ngram(sample_text, 3)) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
안녕하세요! 좋은 코드 감사합니다!
코드에 관련해서 궁금한 점이 있는데요, word_ngram에서 ngrams = [text[x:x+num_gram] for x in range(0, len(text))] 에서
text의 인덱스에서 num_gram을 왜 더하는지 궁금합니다!