Skip to content

Instantly share code, notes, and snippets.

View szilard's full-sized avatar

Szilard Pafka szilard

View GitHub Profile
#include <stdio.h>
#include <stdlib.h>
#define N 100
#define B0 100
#define R 1000000
int main() {
int b[N], rec[N];
for (int i=0; i<N; i++) b[i] = B0;
@szilard
szilard / caret-slowdown-issue.R
Created May 15, 2017 18:37
caret slowdown issue
library(caret)
library(readr)
library(ROCR)
set.seed(123)
d <- read_csv("https://raw.githubusercontent.com/szilard/teach-data-science-UCLA-master-appl-stats/master/wk-06-ML/data/airline100K.csv")
@szilard
szilard / API_DL_FC_catdata--tools.R
Last active December 3, 2016 06:32
API deep learning fully connected with categorical data: h2o > R mxnet > py keras >>>>> tensorflow
#### h2o
library(h2o)
h2o.init(max_mem_size = "50g", nthreads = -1)
dx_train <- h2o.importFile("train-1m.csv")
dx_test <- h2o.importFile("test.csv")
Xnames <- names(dx_train)[which(names(dx_train)!="dep_delayed_15min")]
@szilard
szilard / datable_20Gx3GB_join.R
Last active November 29, 2016 01:06
data.table 20GB x 3GB join
library(data.table)
n <- 2e9
m <- 1e9
system.time( d <- data.table(x = sample(m, n, replace=TRUE), y = runif(n)) )
# user system elapsed
# 103.843 8.255 112.242
system.time( dm <- data.table(x = sample(m)) )
# user system elapsed
# 47.298 1.860 49.288
@szilard
szilard / h2o_steam.R
Created October 14, 2016 04:44
H2O Steam deploy GBM
library(h2o)
h2o.init(nthreads = -1)
dx_train <- h2o.importFile("https://s3.amazonaws.com/benchm-ml--main/train-1m.csv")
system.time({
md_10 <- h2o.gbm(x = 1:(ncol(dx_train)-1), y = ncol(dx_train), training_frame = dx_train,
model_id = "airline_depth10",
@szilard
szilard / ML_with_H2O.R
Last active June 25, 2016 16:19
ML with H2O.ai
library(h2o)
h2o.init(max_mem_size = "20g", nthreads = -1)
# R is connected to the H2O cluster:
# H2O cluster uptime: 1 seconds 704 milliseconds
# H2O cluster version: 3.8.2.8
# H2O cluster name: H2O_started_from_R_szilard_lcr105
# H2O cluster total nodes: 1
# H2O cluster total memory: 17.78 GB
@szilard
szilard / R_1TB_bug.R
Last active June 24, 2016 07:11
R 1TB bug
## allocate <1TB first, stuff works
> x <- 1:1.2e11
> gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 214403 11.5 4.60000e+05 24.6 350000 18.7
Vcells 120000397293 915530.4 1.72801e+11 1318366.9 120000398080 915530.4
> system("echo 1")
1
@szilard
szilard / ec2_x1_2TB.R
Created June 24, 2016 03:15
R on EC2 x1 2TB RAM 128 cores
> system.time(x <- 1:1e11)
user system elapsed
221.491 210.466 432.030
> object.size(x)/1e9
800.00000004 bytes
> system.time(sum(x))
user system elapsed
145.913 78.183 230.063
@szilard
szilard / R_df_copy_3.0vs3.1.R
Last active June 12, 2016 18:38
R dataframes copying 3.0 vs 3.1
system.time(z <- 1:1e9)
system.time(d <- data.frame(x = 1:1e9))
system.time(d$y <- 1:1e9)
system.time(d$z <- z)
system.time(d$x[1] <- 0L)
@szilard
szilard / sqlite_vs_datatable.txt
Last active May 2, 2016 22:39
SQLite vs R data.table
sqlite vs R's data.table
TLDR; sqlite (:memory:) 250 sec data.table 7 sec
data: 100 million rows, 1 million groups
generated by: https://github.com/szilard/benchm-databases/blob/master/0-gendata.txt