Skip to content

Instantly share code, notes, and snippets.

View ikegami-yukino's full-sized avatar

IKEGAMI Yukino ikegami-yukino

View GitHub Profile
@ikegami-yukino
ikegami-yukino / palindromejp.py
Last active May 15, 2018 16:11
MeCabで日本語の回文判定
# -*- coding: utf-8 -*-
import MeCab
mecab_y = MeCab.Tagger("-Oyomi")
def toYomi(text):
return mecab_y.parse(text).rstrip().replace(' ', '')
@ikegami-yukino
ikegami-yukino / naistjdic2ipadic.py
Created June 8, 2013 06:04
NAIST-JDicをIPA-Dicの文脈IDに変換
#coding:utf-8
"""
NAIST-JDicをIPA-Dicの文脈IDに変換
以下の品詞細分類の単語は無視される
五段・タ行/五段・ナ行/五段・バ行/五段・ワ行ウ音便
"""
import sys, codecs, optparse
@ikegami-yukino
ikegami-yukino / oll_client.py
Last active September 13, 2017 08:51
OLL client is a client for using OLL on Python. OLL is a library supporting several for online-learning algorithms.
from subprocess import Popen, PIPE
import time, sys, os
"""
oll_client 0.0.3 - Online Machine Learning Module
OLL client is a client for using OLL, which is a machine learning library have implemented several online-learning algorithms, on Python.
Currently, Oll supports following algorithms:
@ikegami-yukino
ikegami-yukino / hadoop_ubuntu_install.sh
Last active December 25, 2015 03:59
Ubuntu 12.10にHadoopを入れるスクリプト
#
# Ubuntu 12.10にHadoopを入れるスクリプト
#
# Java setting
sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java7-installer
sudo update-java-alternatives -s java-7-oracle
echo 'export JAVA_HOME=/usr/lib/jvm/java-7-oracle' >> ~/.bashrc
@ikegami-yukino
ikegami-yukino / mahout_ubuntu_install.sh
Created October 10, 2013 10:56
buntu 12.10にMahout 0.8を入れるスクリプト
#
# Ubuntu 12.10にMahout 0.8を入れるスクリプト
#
# Installing Maven
#sudo apt-get install maven
sudo apt-get install subversion
# Installing Mahout
@ikegami-yukino
ikegami-yukino / gist:6928667
Created October 11, 2013 02:20
Mac OS Xにmadokaインストール
git clone https://github.com/s-yata/madoka.git
cd madoka
glibtoolize --copy; aclocal ; autoheader; automake -a -c --add-missing; autoconf
./configure
make
sudo make install
@ikegami-yukino
ikegami-yukino / wikipedia_anob.py
Created December 19, 2013 16:48
Wikipediaの不要見出し語をカットするためのsuffix一覧
#! /usr/bin/env python
# -*- coding: utf-8 -*-
from collections import Counter
import re
suffixes = Counter()
re_anob = re.compile(u'(?P<A>.+[^の])の(?P<B>[^の].+)')
re_hiragana = re.compile(u'[ぁ-ゖ]+')
def extract_anob(text):
@ikegami-yukino
ikegami-yukino / nfkc_compare.txt
Created December 30, 2013 19:32
Pythonのunicodedata.normalize('NFKC')で正規化される文字の一覧
# -*- coding: utf-8 -*-
import unicodedata
for unicode_id in xrange(65536):
char = unichr(unicode_id)
normalized_char = unicodedata.normalize('NFKC', char)
if char != normalized_char:
if len(normalized_char) == 1:
print u'[%d] %s -> [%d] %s' % (unicode_id, char, ord(normalized_char), normalized_char)
else:
@ikegami-yukino
ikegami-yukino / 2ch_regex.py
Created January 8, 2014 15:44
2ちゃんねるや2ちゃんねるまとめサイト検出用正規表現 (2014/01/08(水) 12:30:37.67 ID:+pyxCrmX0みたいなやつにマッチする)
import re
re_2ch_post = re.compile(u'[^2]2[0-9]{3}/\d{2}/\d{2}\(.\) \d{2}:\d{2}:\d{2}\.\d{2} ID:[\w\d\+\/]+')
@ikegami-yukino
ikegami-yukino / propensity.py
Created January 26, 2014 17:04
性癖リストを同義語辞書用JSONにする https://dl.dropboxusercontent.com/u/49326509/Propensity.txt
# -*- coding: utf-8 -*-
import json
import fileinput
import re
import codecs
'''
性癖リストを同義語辞書用JSONにする
https://dl.dropboxusercontent.com/u/49326509/Propensity.txt