Skip to content

Instantly share code, notes, and snippets.

KUROYANAGI KEIICHI Keiku

Block or report user

Report or block Keiku

Hide content and notifications from this user.

Learn more about blocking users

Contact Support about this user’s behavior.

Learn more about reporting abuse

Report abuse
View GitHub Profile
@Keiku
Keiku / read_copytext.py
Created Jan 19, 2018
Read copy text to pandas DataFrame.
View read_copytext.py
import pandas as pd
from io import StringIO
def read_copytext(text):
text1 = StringIO(text)
df = pd.read_table(text1)
df.columns = ["col1"]
df["col1"] = df["col1"].str.replace("\s+", ",")
@Keiku
Keiku / split_KFold.py
Last active May 2, 2017
Split K-fold validation dataset.
View split_KFold.py
import string
import numpy as np
import pandas as pd
from sklearn.model_selection import KFold, StratifiedKFold
X_train = np.random.random((10, 2))
y_train = np.array([1, 1, 1, 1, 1, 0, 0, 0, 0, 0])
column = "pred"
n_fold = 5
@Keiku
Keiku / get_wordnet_synonyms.py
Created Apr 28, 2017
Extract the synonyms by using wordnet.
View get_wordnet_synonyms.py
from itertools import chain
from nltk.corpus import wordnet
synonyms = wordnet.synsets('change')
lemmas = set(chain.from_iterable([word.lemma_names() for word in synonyms]))
lemmas
# Out[31]:
# {'alter',
# 'alteration',
# 'change',
@Keiku
Keiku / stack_sparse_matrix.py
Created Apr 28, 2017
Stack the sparse matrices.
View stack_sparse_matrix.py
import numpy as np
import scipy as sp
import pandas as pd
df1 = pd.DataFrame({"A": [1, 2], "B": [3, 4]})
df2 = pd.DataFrame({"C": [5, 6]})
X1 = sp.sparse.csr_matrix(df1.values)
X1_dense = X1.todense()
# Out[28]:
View list_operations.py
import numpy as pd
# Python
list(map(lambda x: x + 1, range(1, 6, 1)))
# Out[1]: [2, 3, 4, 5, 6]
# Numpy
list(np.array(range(1, 6, 1)) + 1)
# Out[2]: [2, 3, 4, 5, 6]
@Keiku
Keiku / tmux.sh
Created Apr 17, 2017
tmux command reference.
View tmux.sh
# show prefix
tmux show-options -g prefix
# new session
tmux
tmux work
# check sessions
tmux ls
@Keiku
Keiku / OrderedDict_sample.py
Last active Apr 13, 2017
Get keys/values from sorted OrderedDict.
View OrderedDict_sample.py
from collections import OrderedDict
d = {'A': 3,
'B': 2,
'C': 1}
OrderedDict(sorted(d.items(), key=lambda x: x[0])).values()
# Out[1]: odict_values([3, 2, 1])
OrderedDict(sorted(d.items(), key=lambda x: x[1])).values()
# Out[2]: odict_values([1, 2, 3])
@Keiku
Keiku / extract_onehot_vector.py
Created Apr 12, 2017
Extract the one-hot encoding vector.
View extract_onehot_vector.py
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
X_str = np.array([['a', 'dog', 'red'], ['b', 'cat', 'green']])
# transform to integer
X_int = LabelEncoder().fit_transform(X_str.ravel()).reshape(*X_str.shape)
# transform to binary
X_bin = OneHotEncoder().fit_transform(X_int).toarray()
print(X_bin)
# [[ 1. 0. 0. 1. 0. 1.]
@Keiku
Keiku / extract_tfidf_vector.py
Last active Apr 11, 2017
Extract the tf-idf vector.
View extract_tfidf_vector.py
text = ['This is a string', 'This is another string', 'TFIDF computation calculation', 'TfIDF is the product of TF and IDF']
from sklearn.feature_extraction.text import TfidfVectorizer
vectorizer = TfidfVectorizer(max_df=1.0, min_df=1, stop_words='english', norm = None)
X = vectorizer.fit_transform(text)
X_vovab = vectorizer.get_feature_names()
# Out[1]: ['calculation', 'computation', 'idf', 'product', 'string', 'tf', 'tfidf']
X_mat = X.todense()
# Out[2]:
@Keiku
Keiku / Modeling_GermanCredit.r
Created Mar 17, 2017
データサイエンティスト養成読本 登竜門編 「11-3 Rで機械学習を試してみよう」のソースコード
View Modeling_GermanCredit.r
# パッケージをインストールする
pkgs <- c("dplyr", "rpart", "rpart.plot", "rattle", "mlr", "evtree")
install.packages(pkgs, quiet = TRUE)
# パッケージを読み込む
library("dplyr")
library("rattle")
library("mlr")
library("evtree")
You can’t perform that action at this time.