Skip to content

Instantly share code, notes, and snippets.

Last active August 29, 2015 14:01
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
Star You must be signed in to star a gist
Save fredrick/5d5bf5baefff19d791a7 to your computer and use it in GitHub Desktop.
R Statistics Cheetsheet
## Graphing
# Histogram of columnName in someDataFrame, ordered in descending frequency
# from left to right.
ggplot(someDataFrame, aes(x=reorder(columnName,columnName,function(x)-length(x)))) +
geom_bar() +
xlab("X Label") +
ylab("Y Label")
# Frequency punchcard of columnX versus columnY in someDataFrame.
dfc <- ddply(someDataFrame, c("columnX", "columnY"), "nrow", .drop=FALSE)
ggplot(data=dfc, aes(x=columnX, y=columnY, size=factor(nrow), color=factor(nrow))) +
geom_point() +
scale_size_discrete(range=c(1, 10)) +
labs(size="Frequency", color="Frequency")
## Subsets
# Omit NA values and boxplot outliers from data frame column
na.omit(someDataFrame$columnName[!someDataFram$columnName %in% boxplot.stats(someDataFrame$columnName)$out])
# Group dates into year, month factors
# Combine two data frames together
common.names <- intersect(colnames(, colnames(database.two))
combined.database <- rbind([, common.names], database.two[, common.names])
## Factor analysis
# PCA Variable Factor Map
result <- PCA(someDataFrame)
## Data mining
# Association rule learning
rules <- apriori(factorDataFrame,
parameter = list(minlen=2, supp=0.005, conf=0.8),
appearance = list(rhs=c("dependent_variable=1"), default="lhs"),
control = list(verbose=F))
rules.sorted <- sort(rules, by="lift")
subset.matrix <- is.subset(rules.sorted, rules.sorted)
subset.matrix[lower.tri(subset.matrix, diag=T)] <- NA
redundant <- colSums(subset.matrix, na.rm=T) >= 1
rules.pruned <- rules.sorted[!redundant]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment