This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
InChIKey=ACAHMMGNYHEOMN-UHFFFAOYSA-N | |
InChIKey=AEMHIJYDVNIZCG-UHFFFAOYSA-N | |
InChIKey=AEMQCMLHPDVYCV-UHFFFAOYSA-N | |
InChIKey=AEPNDGMPRCFRKR-UHFFFAOYSA-N | |
InChIKey=AGRCWCGWOIECPK-UHFFFAOYSA-N | |
InChIKey=AHIPWBCASABPFJ-UHFFFAOYSA-N | |
InChIKey=AMNJWFPHKGUIDQ-UHFFFAOYSA-N | |
InChIKey=AMVQEWGAVVIDKE-UHFFFAOYSA-N | |
InChIKey=AQIDTXWEPULKKH-UHFFFAOYSA-N | |
InChIKey=ARWIECZMCKTHCZ-UHFFFAOYSA-N |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
ID# global num formula AMG count formula AMG isomers AMG time [sec] CHECK CHECK Formula No. Isomers MOLGEN check formula check isomer num C H Br Cl F N O P S Si Exist Lewis Check SeniorMax DU # e- #Atoms LEWIS Sum H/C H/C OK N/C O/C P/C S/C NOPS Check HNOPS Probab accurate mass M M+1 M+2 M+3 | |
1 2 2 C2 0 C2 2 0.33 TRUE TRUE C2 0 TRUE TRUE 2 0 0 0 0 0 0 0 0 0 NO YES YES 3 8 2 8 0.00 NO 0.00 0.00 0.00 0.00 YES NO 24.00000000 100.00 0.00 0.00 0.00 | |
2 3 3 C2H2 1 C2H2 3 0.32 TRUE TRUE C2H2 1 TRUE TRUE 2 2 0 0 0 0 0 0 0 0 YES YES YES 2 10 4 10 1.00 YES 0.00 0.00 0.00 0.00 YES YES 26.01564920 100.00 0.00 0.00 0.00 | |
3 4 4 C2H4 1 C2H4 4 0.32 TRUE TRUE C2H4 1 TRUE TRUE 2 4 0 0 0 0 0 0 0 0 YES YES YES 1 12 6 12 2.00 YES 0.00 0.00 0.00 0.00 YES YES 28.03129840 100.00 0.00 0.00 0.00 | |
4 5 5 CH2O 1 CH2O 5 0.32 TRUE TRUE CH2O 1 TRUE TRUE 1 2 0 0 0 0 1 0 0 0 YES YES YES 1 12 4 8 2.00 YES 0.00 1.00 0.00 0.00 YES YES 30.01056420 100.00 1.15 0.21 0.00 | |
5 6 6 C2H6 1 C2H6 6 0.33 TRUE TRUE C2H6 1 TRUE TRUE 2 6 0 0 0 0 0 0 0 0 YES YES Y |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#Setup | |
rm(list = ls(all = TRUE)) | |
gc(reset=TRUE) | |
set.seed(42) #From random.org | |
#Libraries | |
library(caret) | |
library(devtools) | |
install_github('caretEnsemble', 'zachmayer') #Install zach's caretEnsemble package |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Change Point Detection Packages in R | |
Thanks to the R community, there are packages already existing on CRAN all focusing on change point detection. Especially the following packages are useful because they are not restricted to a special application domain and applicable to time series in general: | |
CPM – “Parametric and Nonparametric Sequential Change Detection in R”: | |
Useful for detecting multiple change points in a time series from an unknown underlying distribution. Another bonus is that the method is applicable to data streams, where an observation is only considered once. Because of the “stream nature” of the cpm approach a second output are the detection points themselves. They mark the time when the change point is detected by the algorithm and quantify the delay. Unfortunately the cpm package is no longer maintained on CRAN. For windows users I uploaded a zipped version of the installed package from my R library here. It should work with R 3.0 and 3.1 under Windows 7/8. | |
BCP – “An R Package for Performi |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# things that affect speeds of random forests | |
# http://stackoverflow.com/questions/14106010/parallel-execution-of-random-forest-in-r/15771458#15771458 | |
# also | |
# http://stackoverflow.com/questions/23075506/how-to-improve-randomforest-performance?lq=1 | |
# also | |
# | |
Setting .multicombine to TRUE can make a significant difference: | |
````R | |
rf <- foreach(ntree=rep(25000, 6), .combine=combine, .multicombine=TRUE, |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Run R code on 32 bit and 64 bit and get different results | |
# You need to run the code on 32 Bit R version and then the 64 bit R (not just the code below) | |
# http://stackoverflow.com/questions/17881609/parallel-randomforest-with-different-results-using-dosnow | |
# | |
library(foreach) | |
library(doSNOW) | |
library(parallel) | |
set.seed(666) | |
ncores <- 4 | |
cl <- makeCluster(ncores) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Random forest are random...indeed | |
# http://stats.stackexchange.com/questions/35609/why-do-i-need-bag-composition-to-calculate-oob-error-of-combined-random-forest-m | |
# https://github.com/mlist/IB2014/blob/master/helper_methods.R | |
# Random Forest combine: http://www.inside-r.org/packages/cran/randomForest/docs/combine | |
# | |
# This has implications on parallel random forests using snow, doSNOW, doParallel etc. | |
# | |
# err.rate : NULL | |
# err.rate : NULL | |
# OOB : NULL |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# caret plot functions | |
# http://machinelearningmastery.com/data-visualization-with-the-caret-r-package/ | |
library(caret) | |
# load the data | |
data(iris) | |
windows() | |
# pair-wise plots of all 4 attributes, dots colored by class | |
featurePlot(x=iris[,1:4], y=iris[,5], plot="pairs", auto.key=list(columns=3)) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# reproducible caret models | |
# http://stackoverflow.com/questions/13403427/fully-reproducible-parallel-models-using-caret | |
library(doParallel); library(caret) | |
#create a list of seed, here change the seed for each resampling | |
set.seed(123) | |
seeds <- vector(mode = "list", length = 11)#length is = (n_repeats*nresampling)+1 | |
for(i in 1:10) seeds[[i]]<- sample.int(n=1000, 3) #(3 is the number of tuning parameter, mtry for rf, here equal to ncol(iris)-2) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# library(memoise) can speed up code 1000x fold | |
# for true speed-up on fibonacci use library(gmp) fibnum(n) | |
# http://adv-r.had.co.nz/Function-operators.html | |
# https://cran.r-project.org/web/packages/memoise/memoise.pdf | |
# Tobias Kind (2015) https://gist.github.com/tobigithub | |
library(memoise) | |
# no memoise() | |
fib1 <- function(n) { |
OlderNewer