Skip to content

Instantly share code, notes, and snippets.

@naotokui
naotokui / lakh_clean_midi_genres.json
Last active July 13, 2017 02:31
Genre info for songs in clean_midi folder of Lakh MIDI dataset (source: Gracenote API). See http://colinraffel.com/projects/lmd/
[
{
"title" : "Amish Paradise",
"artist" : "\"Weird Al\" Yankovic",
"genres" : [
"Other",
"Comedy",
"Comedy"
]
},
@naotokui
naotokui / conditional_vae_keras.ipynb
Created June 29, 2017 01:02
Conditional VAE in Keras
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@naotokui
naotokui / conditional_vae_keras.ipynb
Created June 29, 2017 01:01
Conditional VAE in Keras
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@naotokui
naotokui / vae_keras.ipynb
Created June 28, 2017 05:16
variational autoencoder in keras
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@naotokui
naotokui / emoji_regex.py
Created May 19, 2017 04:21
find unicode emoji in python regex
import re
emoji_pattern = re.compile(
u"(\ud83d[\ude00-\ude4f])|" # emoticons
u"(\ud83c[\udf00-\uffff])|" # symbols & pictographs (1 of 2)
u"(\ud83d[\u0000-\uddff])|" # symbols & pictographs (2 of 2)
u"(\ud83d[\ude80-\udeff])|" # transport & map symbols
u"(\ud83c[\udde0-\uddff])" # flags (iOS)
"+", flags=re.UNICODE)
@naotokui
naotokui / list_to_set_ordered.py
Last active May 10, 2017 05:55
list -> set in python remove duplication while preserving the original order
a = [5, 1, 3, 5, 4, 2, 10, 7, 8, 0, 1, 3]
# list -> set : remove duplication
print list(set(a)) # [0, 1, 2, 3, 4, 5, 7, 8, 10]
# list -> set while preserving original order
b = sorted(set(a), key=a.index) # [5, 1, 3, 4, 2, 10, 7, 8, 0]
# if the list is very very long and performance matters, use more_itertools
# pip install more_itertools
@naotokui
naotokui / japanese_word_split.py
Created May 9, 2017 01:44
splitting Japanese word - 日本語単語分かち書き
import MeCab
mt = MeCab.Tagger("-Ochasen")
def wakati_text_mecab(text):
res = mt.parseToNode(text.encode("utf-8"))
words = []
try:
while res:
surface = res.surface
@naotokui
naotokui / ja_sentence_tokenize.py
Created May 9, 2017 01:28
Japanese sentence tokenizer - 日本語 - 文に分ける 簡易版
import re
import nltk
sent_detector = nltk.RegexpTokenizer(u'[^ !?。]*[!?。.\n]')
sents = sent_detector.tokenize(u" 原子番号92のウランより重い元素は全て人工的に合成され、118番まで発見の報告がある。\
113番については、理研と米露の共同チームがそれぞれ「発見した」と報告し、国際純正・応用化学連合と国際純粋・応用物理学連合の合同作業部会が審査していた。両学会は「データの確実性が高い」ことを理由に、理研の発見を認定し、31日に森田さんに通知した。未確定だった115番と117番、118番の新元素は米露チームの発見を認めた。森田さんは「周期表に名前が残ることは感慨深い。大勢の共同研究者にまずは感謝したい」と述べた。 \n")
for s in sents:
print s, len(s)
@naotokui
naotokui / melody_extraction.ipynb
Created May 8, 2017 14:12
extract melody from audio
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@naotokui
naotokui / gist:0aa8b58a146b18b49a2ab825b413343a
Created March 14, 2017 09:33
how to set window width of ipython jupyter cell
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:100% !important; }</style>"))