Skip to content

Instantly share code, notes, and snippets.

@tobigithub
tobigithub / C7H2-inchikeys
Created October 16, 2012 04:40
Unique unique Isomers of C7H2
InChIKey=ACAHMMGNYHEOMN-UHFFFAOYSA-N
InChIKey=AEMHIJYDVNIZCG-UHFFFAOYSA-N
InChIKey=AEMQCMLHPDVYCV-UHFFFAOYSA-N
InChIKey=AEPNDGMPRCFRKR-UHFFFAOYSA-N
InChIKey=AGRCWCGWOIECPK-UHFFFAOYSA-N
InChIKey=AHIPWBCASABPFJ-UHFFFAOYSA-N
InChIKey=AMNJWFPHKGUIDQ-UHFFFAOYSA-N
InChIKey=AMVQEWGAVVIDKE-UHFFFAOYSA-N
InChIKey=AQIDTXWEPULKKH-UHFFFAOYSA-N
InChIKey=ARWIECZMCKTHCZ-UHFFFAOYSA-N
@tobigithub
tobigithub / AMG20121014-errors
Created October 17, 2012 02:55
AMG20121014 error check (copy/paste into EXCEL or OO Calc)
ID# global num formula AMG count formula AMG isomers AMG time [sec] CHECK CHECK Formula No. Isomers MOLGEN check formula check isomer num C H Br Cl F N O P S Si Exist Lewis Check SeniorMax DU # e- #Atoms LEWIS Sum H/C H/C OK N/C O/C P/C S/C NOPS Check HNOPS Probab accurate mass M M+1 M+2 M+3
1 2 2 C2 0 C2 2 0.33 TRUE TRUE C2 0 TRUE TRUE 2 0 0 0 0 0 0 0 0 0 NO YES YES 3 8 2 8 0.00 NO 0.00 0.00 0.00 0.00 YES NO 24.00000000 100.00 0.00 0.00 0.00
2 3 3 C2H2 1 C2H2 3 0.32 TRUE TRUE C2H2 1 TRUE TRUE 2 2 0 0 0 0 0 0 0 0 YES YES YES 2 10 4 10 1.00 YES 0.00 0.00 0.00 0.00 YES YES 26.01564920 100.00 0.00 0.00 0.00
3 4 4 C2H4 1 C2H4 4 0.32 TRUE TRUE C2H4 1 TRUE TRUE 2 4 0 0 0 0 0 0 0 0 YES YES YES 1 12 6 12 2.00 YES 0.00 0.00 0.00 0.00 YES YES 28.03129840 100.00 0.00 0.00 0.00
4 5 5 CH2O 1 CH2O 5 0.32 TRUE TRUE CH2O 1 TRUE TRUE 1 2 0 0 0 0 1 0 0 0 YES YES YES 1 12 4 8 2.00 YES 0.00 1.00 0.00 0.00 YES YES 30.01056420 100.00 1.15 0.21 0.00
5 6 6 C2H6 1 C2H6 6 0.33 TRUE TRUE C2H6 1 TRUE TRUE 2 6 0 0 0 0 0 0 0 0 YES YES Y
@tobigithub
tobigithub / demo.R
Last active September 19, 2015 04:05 — forked from zachmayer/demo.R
#Setup
rm(list = ls(all = TRUE))
gc(reset=TRUE)
set.seed(42) #From random.org
#Libraries
library(caret)
library(devtools)
install_github('caretEnsemble', 'zachmayer') #Install zach's caretEnsemble package
Change Point Detection Packages in R
Thanks to the R community, there are packages already existing on CRAN all focusing on change point detection. Especially the following packages are useful because they are not restricted to a special application domain and applicable to time series in general:
CPM – “Parametric and Nonparametric Sequential Change Detection in R”:
Useful for detecting multiple change points in a time series from an unknown underlying distribution. Another bonus is that the method is applicable to data streams, where an observation is only considered once. Because of the “stream nature” of the cpm approach a second output are the detection points themselves. They mark the time when the change point is detected by the algorithm and quantify the delay. Unfortunately the cpm package is no longer maintained on CRAN. For windows users I uploaded a zipped version of the installed package from my R library here. It should work with R 3.0 and 3.1 under Windows 7/8.
BCP – “An R Package for Performi
# things that affect speeds of random forests
# http://stackoverflow.com/questions/14106010/parallel-execution-of-random-forest-in-r/15771458#15771458
# also
# http://stackoverflow.com/questions/23075506/how-to-improve-randomforest-performance?lq=1
# also
#
Setting .multicombine to TRUE can make a significant difference:
````R
rf <- foreach(ntree=rep(25000, 6), .combine=combine, .multicombine=TRUE,
# Run R code on 32 bit and 64 bit and get different results
# You need to run the code on 32 Bit R version and then the 64 bit R (not just the code below)
# http://stackoverflow.com/questions/17881609/parallel-randomforest-with-different-results-using-dosnow
#
library(foreach)
library(doSNOW)
library(parallel)
set.seed(666)
ncores <- 4
cl <- makeCluster(ncores)
# Random forest are random...indeed
# http://stats.stackexchange.com/questions/35609/why-do-i-need-bag-composition-to-calculate-oob-error-of-combined-random-forest-m
# https://github.com/mlist/IB2014/blob/master/helper_methods.R
# Random Forest combine: http://www.inside-r.org/packages/cran/randomForest/docs/combine
#
# This has implications on parallel random forests using snow, doSNOW, doParallel etc.
#
# err.rate : NULL
# err.rate : NULL
# OOB : NULL
# caret plot functions
# http://machinelearningmastery.com/data-visualization-with-the-caret-r-package/
library(caret)
# load the data
data(iris)
windows()
# pair-wise plots of all 4 attributes, dots colored by class
featurePlot(x=iris[,1:4], y=iris[,5], plot="pairs", auto.key=list(columns=3))
# reproducible caret models
# http://stackoverflow.com/questions/13403427/fully-reproducible-parallel-models-using-caret
library(doParallel); library(caret)
#create a list of seed, here change the seed for each resampling
set.seed(123)
seeds <- vector(mode = "list", length = 11)#length is = (n_repeats*nresampling)+1
for(i in 1:10) seeds[[i]]<- sample.int(n=1000, 3) #(3 is the number of tuning parameter, mtry for rf, here equal to ncol(iris)-2)
# library(memoise) can speed up code 1000x fold
# for true speed-up on fibonacci use library(gmp) fibnum(n)
# http://adv-r.had.co.nz/Function-operators.html
# https://cran.r-project.org/web/packages/memoise/memoise.pdf
# Tobias Kind (2015) https://gist.github.com/tobigithub
library(memoise)
# no memoise()
fib1 <- function(n) {