Skip to content

Instantly share code, notes, and snippets.

View dgrapov's full-sized avatar

Dmitry Grapov dgrapov

View GitHub Profile
@dgrapov
dgrapov / covariate_adjust.R
Created January 6, 2024 08:56
Example of linear model base covariate adjustment
#get linear model residuals
#' @import dplyr
#' @export
dave_lm_adjust<-function(data,formula,test_vars,adjust=TRUE,progress=TRUE){
if (progress == TRUE){ pb <- txtProgressBar(min = 0, max = ncol(data), style = 3)} else {pb<-NULL}
out <- lapply(1:length(test_vars), function(i) {
if (progress == TRUE) {
setTxtProgressBar(pb, i)
DATA SCIENCE EXERCISE
The following challenge requires the beer reviews data set called beer_reviews.csv. This data set can be downloaded from the following site: https://data.world/socialmediadata/beeradvocate . Note you can create a free temporary account to download this .csv.
Questions to answer using this data:
Which brewery produces the strongest beers by ABV%?
If you had to pick 3 beers to recommend using only this data, which would you pick?
Which of the factors (aroma, taste, appearance, palette) are most important in determining the overall quality of a beer?
Additional math/coding question unrelated to the data:
@dgrapov
dgrapov / replace_in.R
Created April 6, 2018 02:07
Deep error
> in
Error: unexpected 'in' in "in"
@dgrapov
dgrapov / pca.R
Created March 24, 2018 03:09
basic principal components analysis and visualization in R
# Basic PCA example
# use www.createdatasol.com for
# an advanced user interface
#required packages for plotting
library(ggplot2)
library(ggrepel)
#load data
data<-read.csv('~/Sampledata.csv',
@dgrapov
dgrapov / example.R
Created February 2, 2018 04:48
Example of a shiny app with data upload and different plot options
#initialize
library(shiny)
library(ggplot2)
library(purrr)
library(dplyr)
#example data
data(iris)
@dgrapov
dgrapov / tanimoto.R
Created January 10, 2018 21:53
fast (?) implementations of tanimoto distance calculations
#' @title fast_tanimoto
#' @param mat matrix or data frame of numeric values
#' @param output 'matrix' (default) or 'edge list' (non-redundant and undirected)
#' @param progress TRUE, show progress
#' @imports reshape2
fast_tanimoto<-function(mat,output='matrix',progress=TRUE){
mat[is.na(mat)]<-0
#scoring function
score<-function(x){sum(x==2)/sum(x>0)}
@dgrapov
dgrapov / plotly_select_DT.R
Last active September 10, 2020 01:25
ggplot2 to plotly to shiny to box/lasso select to DT
#plotly box or lasso select linked to
# DT data table
# using Wage data
# the out group: is sex:Male, region:Middle Atlantic +
library(ggplot2)
library(plotly)
library(dplyr)
library(ISLR)
@dgrapov
dgrapov / SOM example.R
Last active March 11, 2023 11:21
Self-organizing map (SOM) example in R
#SOM example using wines data set
library(kohonen)
data(wines)
set.seed(7)
#create SOM grid
sommap <- som(scale(wines), grid = somgrid(2, 2, "hexagonal"))
## use hierarchical clustering to cluster the codebook vectors
groups<-3
@dgrapov
dgrapov / example.R
Last active February 21, 2023 15:27
Convert adjacency (or other) matrix to edge list
library(reshape2)
gen.mat.to.edge.list<-function(mat,symmetric=TRUE,diagonal=FALSE,text=FALSE){
#create edge list from matrix
# if symmetric duplicates are removed
mat<-as.matrix(mat)
id<-is.na(mat) # used to allow missing
mat[id]<-"nna"
if(symmetric){mat[lower.tri(mat)]<-"na"} # use to allow missing values
if(!diagonal){diag(mat)<-"na"}
@dgrapov
dgrapov / RECA_test.R
Created August 21, 2015 13:42
Testing RECA: Relevant Component Analysis for Supervised Distance Metric Learning
#R code, testing RECA with the iris data
library(RECA)
#test data
data(iris)
x<-iris[,-5]
y<-iris$Species
#similar groups (species) in each chunk (n=3)
chunksvec<-as.numeric(y)