Skip to content

Instantly share code, notes, and snippets.

View duttashi's full-sized avatar
🎯
Focusing

Ashish Dutt duttashi

🎯
Focusing
View GitHub Profile
@duttashi
duttashi / spark_install.R
Created July 3, 2017 05:39
How to install different versions of SparkR
install.packages("devtools")
devtools::install_github("rstudio/sparklyr")
library(sparklyr)
# check spark installed versions
spark_installed_versions()
# install spark versions
spark_install(version = "1.6.2")
spark_install(version = "2.0.0")
@duttashi
duttashi / spark_conn.R
Last active July 3, 2017 06:02
Connecting to spark on local cluster and other basic spark functions
# Load sparlyr library in R environment
library(sparklyr)
# connecting to spark local cluster
sc <- spark_connect(master = "local", version="2.1.0")
# print the spark version
spark_version(sc)
# check data tables in spark local cluster
src_tbls(sc) # If no table copied in local cluster, then NULL or character(0) will be returned
# Copy data to spark local instance
flights_tbl <- copy_to(sc, nycflights13::flights, "flights", overwrite = TRUE)
@duttashi
duttashi / wine.csv
Created July 3, 2017 06:43
the classic wine quality dataset
Wine Alcohol Malic.acid Ash Acl Mg Phenols Flavanoids Nonflavanoid.phenols Proanth Color.int Hue OD Proline
1 14.23 1.71 2.43 15.6 127 2.8 3.06 .28 2.29 5.64 1.04 3.92 1065
1 13.2 1.78 2.14 11.2 100 2.65 2.76 .26 1.28 4.38 1.05 3.4 1050
1 13.16 2.36 2.67 18.6 101 2.8 3.24 .3 2.81 5.68 1.03 3.17 1185
1 14.37 1.95 2.5 16.8 113 3.85 3.49 .24 2.18 7.8 .86 3.45 1480
1 13.24 2.59 2.87 21 118 2.8 2.69 .39 1.82 4.32 1.04 2.93 735
1 14.2 1.76 2.45 15.2 112 3.27 3.39 .34 1.97 6.75 1.05 2.85 1450
1 14.39 1.87 2.45 14.6 96 2.5 2.52 .3 1.98 5.25 1.02 3.58 1290
1 14.06 2.15 2.61 17.6 121 2.6 2.51 .31 1.25 5.05 1.06 3.58 1295
1 14.83 1.64 2.17 14 97 2.8 2.98 .29 1.98 5.2 1.08 2.85 1045
@duttashi
duttashi / kfold-CV.r
Created July 6, 2017 06:46
k-fold cross validation script for R
library(plyr) # for create_progress_bar()
library(randomForest)
data <- iris
# in this cross validation example, we use the iris data set to
# predict the Sepal Length from the other variables in the dataset
# with the random forest model
k = 5 #Folds
@duttashi
duttashi / kfold-cv-custom-function.R
Created July 11, 2017 09:56
A simple function to perform k-fold cross validation in R
#Randomly shuffle the data
yourdata<-yourdata[sample(nrow(yourdata)),]
#Create 10 equally size folds
folds <- cut(seq(1,nrow(yourdata)),breaks=10,labels=FALSE)
#Perform 10 fold cross validation
for(i in 1:10){
#Segement your data by fold using the which() function
testIndexes <- which(folds==i,arr.ind=TRUE)
testData <- yourdata[testIndexes, ]
trainData <- yourdata[-testIndexes, ]
@duttashi
duttashi / rename_multiple_columns
Created July 17, 2017 03:56
How to rename multiple columns in R
Say I have
x=data.frame(q=1,w=2,e=3, ...and many many columns...)
what is the most elegant way to rename an arbitrary subset of columns, whose position I don't necessarily know, into some other arbitrary names?
e.g. Say I want to rename "q" and "e" into "A" and "B", what is the most elegant code to do this?
Obviously, I can do a loop
oldnames=c("q","e")
@duttashi
duttashi / drop_multiple_cols.R
Created August 6, 2017 07:28
To drop multiple columns in R
library(data.table)
DT[,coltodelete:=NULL]
# OR
DT[,c("col1","col20"):=NULL]
# OR
DT[,(125:135):=NULL]
# OR
DT[,(variableHoldingNamesOrNumbers):=NULL]
@duttashi
duttashi / merge_two_dataframes_on_common_cols.R
Created August 6, 2017 07:34
To merge two data frames on common columns in R
# Method 1
# Merge these four separate data frame into a single table on SCHCD
x.data<- join(enrolrep.data, facilty.data, by=c("SCHCD"))
y.data<- join(basic.data, x.data, by=c("SCHCD"))
master.data<- join(x.data, teachr.data, by=c("SCHCD"))
# Method 2: using setDT() of data.table(). I have to find out how its done in setDT()
@duttashi
duttashi / detach_package.R
Created August 10, 2017 08:19
How to detach a package in R without restarting the R session
> detach("package:mice", unload=TRUE)
@duttashi
duttashi / remove_pkg_dep.R
Created August 21, 2017 23:45
To uninstall a R package and all its dependencies
# The below code is adoped from the answer by user `Thomas` posted on StackOverflow https://stackoverflow.com/questions/26573368/uninstall-remove-r-package-with-dependencies
library("tools")
removeDepends <- function(pkg, recursive = FALSE){
d <- package_dependencies(,installed.packages(), recursive = recursive)
depends <- if(!is.null(d[[pkg]])) d[[pkg]] else character()
needed <- unique(unlist(d[!names(d) %in% c(pkg,depends)]))
toRemove <- depends[!depends %in% needed]
if(length(toRemove)){
toRemove <- select.list(c(pkg,sort(toRemove)), multiple = TRUE,