Skip to content

Instantly share code, notes, and snippets.

View Keiku's full-sized avatar
🐢
Slowly but surely.

Keiichi Kuroyanagi Keiku

🐢
Slowly but surely.
View GitHub Profile
@Keiku
Keiku / OrderedDict_sample.py
Last active April 13, 2017 03:35
Get keys/values from sorted OrderedDict.
from collections import OrderedDict
d = {'A': 3,
'B': 2,
'C': 1}
OrderedDict(sorted(d.items(), key=lambda x: x[0])).values()
# Out[1]: odict_values([3, 2, 1])
OrderedDict(sorted(d.items(), key=lambda x: x[1])).values()
# Out[2]: odict_values([1, 2, 3])
@Keiku
Keiku / extract_onehot_vector.py
Created April 12, 2017 06:30
Extract the one-hot encoding vector.
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
X_str = np.array([['a', 'dog', 'red'], ['b', 'cat', 'green']])
# transform to integer
X_int = LabelEncoder().fit_transform(X_str.ravel()).reshape(*X_str.shape)
# transform to binary
X_bin = OneHotEncoder().fit_transform(X_int).toarray()
print(X_bin)
# [[ 1. 0. 0. 1. 0. 1.]
@Keiku
Keiku / extract_tfidf_vector.py
Last active April 11, 2017 07:40
Extract the tf-idf vector.
text = ['This is a string', 'This is another string', 'TFIDF computation calculation', 'TfIDF is the product of TF and IDF']
from sklearn.feature_extraction.text import TfidfVectorizer
vectorizer = TfidfVectorizer(max_df=1.0, min_df=1, stop_words='english', norm = None)
X = vectorizer.fit_transform(text)
X_vovab = vectorizer.get_feature_names()
# Out[1]: ['calculation', 'computation', 'idf', 'product', 'string', 'tf', 'tfidf']
X_mat = X.todense()
# Out[2]:
@Keiku
Keiku / Modeling_GermanCredit.r
Created March 17, 2017 08:38
データサイエンティスト養成読本 登竜門編 「11-3 Rで機械学習を試してみよう」のソースコード
# パッケージをインストールする
pkgs <- c("dplyr", "rpart", "rpart.plot", "rattle", "mlr", "evtree")
install.packages(pkgs, quiet = TRUE)
# パッケージを読み込む
library("dplyr")
library("rattle")
library("mlr")
library("evtree")
@Keiku
Keiku / dplyr_se.r
Created March 10, 2017 11:07
Summarising by standard evaluation with dplyr.
library(dplyr)
library(lazyeval)
df <- data_frame(group = c(1, 2, 2, 3, 3, 3))
g <- "group"
df %>%
group_by_(g) %>%
summarise_(
@Keiku
Keiku / impute.py
Created March 10, 2017 01:48
Impute some missing columns with pandas.
import pandas as pd
df = pd.DataFrame({'A':['A1', 'A2', 'A3'], 'B':[None, 'B2', None]})
df
# Out[51]:
# A B
# 0 A1 None
# 1 A2 B2
# 2 A3 None
@Keiku
Keiku / misc.r
Last active March 9, 2017 05:00
Misc funcions.
options(scipen = 100, dplyr.width = Inf, dplyr.print_max = Inf)
'%nin%' <- Negate('%in%')
keep_vecs <- function(x, y) x[x %in% y]
drop_vecs <- function(x, y) x[!x %in% y]
keep_vars <- function(.data, x) dplyr::select_(.data, .dots = x)
drop_vars <- function(.data, x) dplyr::select(.data, -one_of(x))
intersect_all <- function(...) Reduce(intersect, list(...))
union_all <- function(...) Reduce(union, list(...))
@Keiku
Keiku / dplyr_examples.r
Created February 23, 2017 02:12
The example codes on dplyr package.
library(dplyr)
iris_df <- as_data_frame(iris)
iris_df %>% rename_(.dots = setNames(names(.), toupper(names(.)))) %>% head(2)
# A tibble: 2 × 5
# SEPAL.LENGTH SEPAL.WIDTH PETAL.LENGTH PETAL.WIDTH SPECIES
# <dbl> <dbl> <dbl> <dbl> <fctr>
# 1 5.1 3.5 1.4 0.2 setosa
# 2 4.9 3.0 1.4 0.2 setosa
@Keiku
Keiku / tidyr_reshape.r
Last active February 22, 2017 02:21
Reshaping with tidyr
library("dplyr")
library("tidyr")
library("data.table")
smp <- data_frame(
ID = rep(1:3, 2),
BMI = rep(c(21, 26), 3),
sbp = rep(c(150, 120), 3),
nendo = rep(2008:2009, 3)
)
@Keiku
Keiku / extract_subset.r
Last active February 20, 2017 07:12
Extract a set from the multiple vectors.
a <- c(1, 3, 5, 7, 9)
b <- c(3, 6, 8, 9, 10)
c <- c(2, 3, 4, 5, 7, 9)
intersect_all <- function(...) Reduce(intersect, list(...))
union_all <- function(...) Reduce(union, list(...))
intersect_all(a, b, c)
# [1] 3 9
union_all(a, b, c)