Skip to content

Instantly share code, notes, and snippets.

View xiaodaigh's full-sized avatar

evalparse xiaodaigh

View GitHub Profile
@xiaodaigh
xiaodaigh / intro.r
Created January 22, 2014 12:26
Intro to R
# reading and writing files
# press ctrl+enter to send
iris
fix(iris)
write.csv(iris,"c:/temp/iris.csv")
iris2 = read.csv("c:/temp/iris.csv")
nrow(iris2)
# get help
@xiaodaigh
xiaodaigh / hd_binning
Created April 10, 2014 02:59
hd_binning
High Definition Binning {#HD_binning}
=====================
The process of binning (or discretization) of variables is a well-established practice in building credit scorecards. The binning process involves taking raw values e.g. income and cutting that data into bins (discrete ranges) such as 2000-3000, 3000-4000. Typically we would see an upward trend in terms of Good/Bad Odds as the income levels go up.
In this blog post I would like to explain a novel approach to binning that can produce very fine binning.
Automatic Binary Binning Algorithm (ABBA)
---------
Top 5 Underdog Stories
1. Zhou Jun Xun wins LG cup
No one from outside Japan, South Korea, China has ever won an international title until Zhou Jun Xun from Taiwan. He was the first 9 dan from Taiwan and he has an unmistakable large red birth mark on his face. On his way to LG cup victory he defeated Lee Chang Ho and Lee Sedol. His most famous win was against Lee Sedol where he played a new move in the avanlanche
2. Ear-reddening
Shuwa was the up start who nobody expected could challenge the pre-Meijin
3. Go Seigen vs Shuko
When Go Seigen won the competition to ear the right to challenge Shuko. No one expected Go to win or perhaps come close to winning. Shuko played white and since there is no komi at the time, the only adavantage he had was that he could adjourn the game at any time. Go lost the game by 2 points.
@xiaodaigh
xiaodaigh / how_to_use_split_into_columns.sas
Last active August 25, 2017 05:22
A SAS macro to split a dataset (.sas7bdat) into datasets where each resultant dataset contains exactly one column of the original data
data a;
do i = 1 to 10000;
b = i;
output;
end;
run;
%include "path/to/split_into_columns.sas";
libname outlib "path/to/output/dataset/";
@xiaodaigh
xiaodaigh / feeatherc.r
Created August 28, 2017 22:47
Some R to read .feather data chunkwise
# featherc
library(data.table)
library(feather)
library(future)
library(dplyr)
plan(multiprocess)
options(future.globals.maxSize = Inf)
split_feather <- function(feather_file, by = NULL, parts = parallel::detectCores()) {
system.time(inputdata <- feather::read_feather(feather_file))
`infix_fn` <- function(left, right) {
#...some code
}
@xiaodaigh
xiaodaigh / 1_forwardflag.r
Last active December 20, 2017 21:02
Fast implementation of binary (true/false) forward looking flag
forwardflag <- function(bools, ...) {
if(typeof(bools) != "boolean") {
warning("input variable not of boolean type, the only other accepted type is 0 & 1")
}
forwardflag_(bools, ...)
}
forwardflag_ <- function(bools, period = 12) {
stopifnot(period > 0)
@xiaodaigh
xiaodaigh / ctree_kmeans_iris_model_assessment
Created October 15, 2017 23:51
ctree vs kmeans on the iris dataset
# data prep ---------------------------------------------------------------
library(data.table)
data(iris)
iris_copy <- copy(iris)
setDT(iris_copy)
iris_copy_ctree <- copy(iris_copy)
# ctree model -------------------------------------------------------------
@xiaodaigh
xiaodaigh / benchmark_2048.jl
Created October 15, 2018 04:06
2048 Simulation challenge
using StatsBase
const DIRS = [:left, :up, :right, :down]
function init_game()
grid = zeros(Int8,4,4)
grid[rand(1:4),rand(1:4)] = rand2_1()
grid[rand(1:4),rand(1:4)] = rand2_1()
grid
end