Skip to content

Instantly share code, notes, and snippets.

View Keiku's full-sized avatar
🐢
Slowly but surely.

Keiichi Kuroyanagi Keiku

🐢
Slowly but surely.
View GitHub Profile
@Keiku
Keiku / Modeling_GermanCredit.r
Created March 17, 2017 08:38
データサイエンティスト養成読本 登竜門編 「11-3 Rで機械学習を試してみよう」のソースコード
# パッケージをインストールする
pkgs <- c("dplyr", "rpart", "rpart.plot", "rattle", "mlr", "evtree")
install.packages(pkgs, quiet = TRUE)
# パッケージを読み込む
library("dplyr")
library("rattle")
library("mlr")
library("evtree")
@Keiku
Keiku / dplyr_se.r
Created March 10, 2017 11:07
Summarising by standard evaluation with dplyr.
library(dplyr)
library(lazyeval)
df <- data_frame(group = c(1, 2, 2, 3, 3, 3))
g <- "group"
df %>%
group_by_(g) %>%
summarise_(
@Keiku
Keiku / impute.py
Created March 10, 2017 01:48
Impute some missing columns with pandas.
import pandas as pd
df = pd.DataFrame({'A':['A1', 'A2', 'A3'], 'B':[None, 'B2', None]})
df
# Out[51]:
# A B
# 0 A1 None
# 1 A2 B2
# 2 A3 None
@Keiku
Keiku / roc_auc.py
Last active October 5, 2022 01:52
Plot ROC curve.
import matplotlib.pyplot as plt
from sklearn.metrics import roc_curve, auc
import seaborn as sns
sns.set('talk', 'whitegrid', 'dark', font_scale=1.5, font='Ricty',
rc={"lines.linewidth": 2, 'grid.linestyle': '--'})
fpr, tpr, _ = roc_curve([1, 0, 1, 0, 1, 0, 0], [0.9, 0.8, 0.7, 0.7, 0.6, 0.5, 0.4])
roc_auc = auc(fpr, tpr)
@Keiku
Keiku / dplyr_examples.r
Created February 23, 2017 02:12
The example codes on dplyr package.
library(dplyr)
iris_df <- as_data_frame(iris)
iris_df %>% rename_(.dots = setNames(names(.), toupper(names(.)))) %>% head(2)
# A tibble: 2 × 5
# SEPAL.LENGTH SEPAL.WIDTH PETAL.LENGTH PETAL.WIDTH SPECIES
# <dbl> <dbl> <dbl> <dbl> <fctr>
# 1 5.1 3.5 1.4 0.2 setosa
# 2 4.9 3.0 1.4 0.2 setosa
@Keiku
Keiku / extract_subset.r
Last active February 20, 2017 07:12
Extract a set from the multiple vectors.
a <- c(1, 3, 5, 7, 9)
b <- c(3, 6, 8, 9, 10)
c <- c(2, 3, 4, 5, 7, 9)
intersect_all <- function(...) Reduce(intersect, list(...))
union_all <- function(...) Reduce(union, list(...))
intersect_all(a, b, c)
# [1] 3 9
union_all(a, b, c)
@Keiku
Keiku / tqdm.py
Created February 17, 2017 05:49
Print progress bar.
import time
from tqdm import tqdm
pbar = tqdm(["1", "2", "3", "4", "5"])
for char in pbar:
pbar.set_description("Processing %s" % char)
time.sleep(1)
# 0%| | 0/5 [00:00<?, ?it/s]
# Processing 1: 20%|██████▏ | 1/5 [00:01<00:04, 1.00s/it]
# Processing 2: 40%|████████████▍ | 2/5 [00:02<00:03, 1.00s/it]
@Keiku
Keiku / check_id_sets.r
Created February 16, 2017 06:41
Check duplicate id list of some tables.
library(gplots)
library(dplyr)
library(magrittr)
check_id_sets <- function(ids){
ids_venn <- gplots::venn(ids, show.plot=FALSE)
ids_list <- unlist(as.list(ids_venn))
mat_dim <- c((length(ids_list) / (length(ids)+1)), length(ids)+1)
id_sets <- ids_list %>%
matrix(., mat_dim) %>%
@Keiku
Keiku / command.sh
Last active March 13, 2023 10:40
A list of linux commands.
# compress/decompress zip file.
zip file.csv.zip file.csv
unzip file.csv.zip
# compress/decompress gz file.
gzip file.csv
gzip -d file.csv.gz
# compress/decompress bz2 file.
bzip2 file.csv
@Keiku
Keiku / chisq.test_by_group.r
Created February 10, 2017 01:48
Chi-square testing in each group.
library(dplyr)
library(purrr)
library(broom)
df <- data_frame(
group = rep(letters[1:2], each = 50),
cat1 = letters[round(runif(100) * 5) + 1],
cat2 = letters[round(runif(100) * 3) + 1]
)