Skip to content

Instantly share code, notes, and snippets.

KUROYANAGI KEIICHI Keiku

Block or report user

Report or block Keiku

Hide content and notifications from this user.

Learn more about blocking users

Contact Support about this user’s behavior.

Learn more about reporting abuse

Report abuse
View GitHub Profile
@Keiku
Keiku / tidy_quantile.r
Created Jan 6, 2017
Calculate percentiles.
View tidy_quantile.r
library(dplyr)
library(broom)
mtcars %>%
group_by(cyl) %>%
do(tidy(t(quantile(.$mpg, probs = seq(0, 1, 0.25)))))
# Source: local data frame [3 x 6]
# Groups: cyl [3]
#
# cyl X0. X25. X50. X75. X100.
@Keiku
Keiku / freq.r
Created Jan 11, 2017
Calculate frequency.
View freq.r
freq <- function(df, ...){
df %<>%
group_by_(...) %>%
summarise(count = n()) %>%
arrange_(.dots = ...) %>%
ungroup() %>%
mutate(
cum_count = cumsum(count),
percent = count / sum(count),
cum_percent = cumsum(percent)
@Keiku
Keiku / serialization_benchmark.r
Created Jan 12, 2017
Serialization benchmark.
View serialization_benchmark.r
library(readr)
library(data.table)
library(feather)
object.size(df)
# 1654613472 bytes
system.time(write_csv(df, "df_write_csv.csv"))
# ユーザ システム 経過
# 160.540 29.079 200.667
system.time(fwrite(df, "df_fwrite.csv"))
@Keiku
Keiku / mlr_iris_example.r
Created Jan 19, 2017
iris example with mlr.
View mlr_iris_example.r
library(mlr)
set.seed(123, "L'Ecuyer")
iris.task = classif.task = makeClassifTask(id = "iris-example", data = iris, target = "Species")
resamp = makeResampleDesc("CV", iters = 10L)
lrn = makeLearner("classif.rpart")
control.grid = makeTuneControlGrid()
@Keiku
Keiku / calc_elapsed_months.r
Created Jan 19, 2017
Calculate elapsed months.
View calc_elapsed_months.r
library(dplyr)
library(lubridate)
df <- data_frame(
id = c(1, 1, 1, 2, 2, 2),
ym = c("201512", "201601", "201603", "201512", "201602", "201603")
)
elapsed_months <- function(end, start) {
12 * (year(end) - year(start)) + (month(end) - month(start))
@Keiku
Keiku / impute.r
Last active Jan 26, 2017
impute a included NA valiable.
View impute.r
library(dplyr)
data <- data_frame(var = c(0, NA, 2))
data %>% mutate(var = coalesce(var, 1))
data %>% mutate(var = replace(var, which(is.na(var)), 1))
data %>% mutate(var = if_else(is.na(var), 1, var))
# A tibble: 3 × 1
# var
# <dbl>
# 1 0
@Keiku
Keiku / count_missing_values.r
Created Jan 26, 2017
count missing values of all columns in DataFrame.
View count_missing_values.r
library(mice)
library(purrr)
map_df(airquality, function(x) sum(is.na(x)))
# A tibble: 1 × 6
# Ozone Solar.R Wind Temp Month Day
# <int> <int> <int> <int> <int> <int>
# 1 37 7 0 0 0 0
@Keiku
Keiku / create_summary_report.r
Created Feb 2, 2017
Create a summary report.
View create_summary_report.r
library(dplyr)
library(tidyr)
iris %>%
as_data_frame(.) %>%
select(matches("Petal")) %>%
summarise_all(.funs = c("01:sum" = "sum",
"02:min" = "min",
"03:q25" = "quantile(., 0.25)",
"04:median" = "median",
@Keiku
Keiku / get_file_list.py
Last active Feb 7, 2017
Get a list of files.
View get_file_list.py
import os
import glob
# アスタリスクが必要
files = glob.glob('/home/dir1/*.zip')
for file in files:
print(file)
print('/home/dir2/' + os.path.basename(file))
# /home/dir1/subset3.zip
# /home/dir2/subset3.zip
@Keiku
Keiku / freq.py
Created Feb 7, 2017
Count frequency of a column in pasdas DataFrame.
View freq.py
import pandas as pd
from sklearn import datasets
iris = datasets.load_iris()
iris_df = pd.DataFrame(iris.data, columns=iris.feature_names)
iris_df['species'] = iris.target
mapping = {0 : 'setosa', 1: 'versicolor', 2: 'virginica'}
iris_df = iris_df.replace({'species': mapping})
def freq(data, var):
You can’t perform that action at this time.