Skip to content

Instantly share code, notes, and snippets.

View Keiku's full-sized avatar
🐢
Slowly but surely.

Keiichi Kuroyanagi Keiku

🐢
Slowly but surely.
View GitHub Profile
@Keiku
Keiku / chisq.test_by_group.r
Last active February 10, 2017 01:49
Chi-square testing in each group.
library(dplyr)
library(broom)
library(lazyeval)
df <- data_frame(
group = rep(letters[1:2], each = 50),
cat1 = letters[round(runif(100) * 5) + 1],
cat2 = letters[round(runif(100) * 3) + 1]
)
@Keiku
Keiku / generate_c_code.r
Last active February 9, 2017 08:33
Generate c() function code.
library(stringr)
add_backquotes <- function(x) paste0("`", x, "`")
add_doublequotes <- function(x) paste0("\"", x, "\"")
generate_c_code <- function(x){
vec <- paste0(add_doublequotes(x), sep=",\n")
vec_tail <- str_replace(tail(vec, 1), ",\n", "\n")
vec_head <- head(vec, length(vec) - 1)
vec <- c(vec_head, vec_tail)
@Keiku
Keiku / cut.py
Created February 8, 2017 02:56
Cut a variable with pandas.
import pandas as pd
from sklearn import datasets
iris = datasets.load_iris()
iris_df = pd.DataFrame(iris.data, columns=iris.feature_names)
iris_df['species'] = iris.target
mapping = {0 : 'setosa', 1: 'versicolor', 2: 'virginica'}
iris_df = iris_df.replace({'species': mapping})
iris_df['sepal length (bins)'] = pd.cut(iris_df['sepal length (cm)'], bins=[0, 3, 6, 9], include_lowest=False, right=True)
@Keiku
Keiku / intersection.py
Created February 7, 2017 04:52
Check intersection.
import pandas as pd
df1 = pd.DataFrame({'id': [1, 2, 3]})
df2 = pd.DataFrame({'id': [2, 3, 4]})
set(df1.id).intersection(set(df2.id))
# Out[73]: {2, 3}
@Keiku
Keiku / freq.py
Created February 7, 2017 04:44
Count frequency of a column in pasdas DataFrame.
import pandas as pd
from sklearn import datasets
iris = datasets.load_iris()
iris_df = pd.DataFrame(iris.data, columns=iris.feature_names)
iris_df['species'] = iris.target
mapping = {0 : 'setosa', 1: 'versicolor', 2: 'virginica'}
iris_df = iris_df.replace({'species': mapping})
def freq(data, var):
@Keiku
Keiku / get_file_list.py
Last active February 7, 2017 03:18
Get a list of files.
import os
import glob
# アスタリスクが必要
files = glob.glob('/home/dir1/*.zip')
for file in files:
print(file)
print('/home/dir2/' + os.path.basename(file))
# /home/dir1/subset3.zip
# /home/dir2/subset3.zip
@Keiku
Keiku / create_summary_report.r
Created February 2, 2017 08:25
Create a summary report.
library(dplyr)
library(tidyr)
iris %>%
as_data_frame(.) %>%
select(matches("Petal")) %>%
summarise_all(.funs = c("01:sum" = "sum",
"02:min" = "min",
"03:q25" = "quantile(., 0.25)",
"04:median" = "median",
@Keiku
Keiku / convert_number_strings_to_numbers.py
Last active January 8, 2023 20:45
Convert number strings with commas in pandas DataFrame to float.
import pandas as pd
import locale
from locale import atof
df = pd.DataFrame([['1,200', '4,200'], ['7,000', '-0.03'], ['5', '0']],
columns=['col1', 'col2'])
# col1 col2
# 0 1,200 4,200
# 1 7,000 -0.03
# 2 5 0
@Keiku
Keiku / count_missing_values.r
Created January 26, 2017 10:53
count missing values of all columns in DataFrame.
library(mice)
library(purrr)
map_df(airquality, function(x) sum(is.na(x)))
# A tibble: 1 × 6
# Ozone Solar.R Wind Temp Month Day
# <int> <int> <int> <int> <int> <int>
# 1 37 7 0 0 0 0
@Keiku
Keiku / impute.r
Last active January 26, 2017 07:34
impute a included NA valiable.
library(dplyr)
data <- data_frame(var = c(0, NA, 2))
data %>% mutate(var = coalesce(var, 1))
data %>% mutate(var = replace(var, which(is.na(var)), 1))
data %>% mutate(var = if_else(is.na(var), 1, var))
# A tibble: 3 × 1
# var
# <dbl>
# 1 0