This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# -*- coding: utf-8 -*- | |
import MeCab | |
mecab_y = MeCab.Tagger("-Oyomi") | |
def toYomi(text): | |
return mecab_y.parse(text).rstrip().replace(' ', '') | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#coding:utf-8 | |
""" | |
NAIST-JDicをIPA-Dicの文脈IDに変換 | |
以下の品詞細分類の単語は無視される | |
五段・タ行/五段・ナ行/五段・バ行/五段・ワ行ウ音便 | |
""" | |
import sys, codecs, optparse |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from subprocess import Popen, PIPE | |
import time, sys, os | |
""" | |
oll_client 0.0.3 - Online Machine Learning Module | |
OLL client is a client for using OLL, which is a machine learning library have implemented several online-learning algorithms, on Python. | |
Currently, Oll supports following algorithms: |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# | |
# Ubuntu 12.10にHadoopを入れるスクリプト | |
# | |
# Java setting | |
sudo add-apt-repository ppa:webupd8team/java | |
sudo apt-get update | |
sudo apt-get install oracle-java7-installer | |
sudo update-java-alternatives -s java-7-oracle | |
echo 'export JAVA_HOME=/usr/lib/jvm/java-7-oracle' >> ~/.bashrc |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# | |
# Ubuntu 12.10にMahout 0.8を入れるスクリプト | |
# | |
# Installing Maven | |
#sudo apt-get install maven | |
sudo apt-get install subversion | |
# Installing Mahout |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
git clone https://github.com/s-yata/madoka.git | |
cd madoka | |
glibtoolize --copy; aclocal ; autoheader; automake -a -c --add-missing; autoconf | |
./configure | |
make | |
sudo make install |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#! /usr/bin/env python | |
# -*- coding: utf-8 -*- | |
from collections import Counter | |
import re | |
suffixes = Counter() | |
re_anob = re.compile(u'(?P<A>.+[^の])の(?P<B>[^の].+)') | |
re_hiragana = re.compile(u'[ぁ-ゖ]+') | |
def extract_anob(text): |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# -*- coding: utf-8 -*- | |
import unicodedata | |
for unicode_id in xrange(65536): | |
char = unichr(unicode_id) | |
normalized_char = unicodedata.normalize('NFKC', char) | |
if char != normalized_char: | |
if len(normalized_char) == 1: | |
print u'[%d] %s -> [%d] %s' % (unicode_id, char, ord(normalized_char), normalized_char) | |
else: |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import re | |
re_2ch_post = re.compile(u'[^2]2[0-9]{3}/\d{2}/\d{2}\(.\) \d{2}:\d{2}:\d{2}\.\d{2} ID:[\w\d\+\/]+') |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# -*- coding: utf-8 -*- | |
import json | |
import fileinput | |
import re | |
import codecs | |
''' | |
性癖リストを同義語辞書用JSONにする | |
https://dl.dropboxusercontent.com/u/49326509/Propensity.txt |