Skip to content

Instantly share code, notes, and snippets.

Avatar

Dmitry Grapov dgrapov

View GitHub Profile
View gist:d15aedea295f32fa43d76b0a864c577b
DATA SCIENCE EXERCISE
The following challenge requires the beer reviews data set called beer_reviews.csv. This data set can be downloaded from the following site: https://data.world/socialmediadata/beeradvocate . Note you can create a free temporary account to download this .csv.
Questions to answer using this data:
Which brewery produces the strongest beers by ABV%?
If you had to pick 3 beers to recommend using only this data, which would you pick?
Which of the factors (aroma, taste, appearance, palette) are most important in determining the overall quality of a beer?
Additional math/coding question unrelated to the data:
View replace_in.R
> in
Error: unexpected 'in' in "in"
@dgrapov
dgrapov / pca.R
Created Mar 24, 2018
basic principal components analysis and visualization in R
View pca.R
# Basic PCA example
# use www.createdatasol.com for
# an advanced user interface
#required packages for plotting
library(ggplot2)
library(ggrepel)
#load data
data<-read.csv('~/Sampledata.csv',
@dgrapov
dgrapov / example.R
Created Feb 2, 2018
Example of a shiny app with data upload and different plot options
View example.R
#initialize
library(shiny)
library(ggplot2)
library(purrr)
library(dplyr)
#example data
data(iris)
@dgrapov
dgrapov / tanimoto.R
Created Jan 10, 2018
fast (?) implementations of tanimoto distance calculations
View tanimoto.R
#' @title fast_tanimoto
#' @param mat matrix or data frame of numeric values
#' @param output 'matrix' (default) or 'edge list' (non-redundant and undirected)
#' @param progress TRUE, show progress
#' @imports reshape2
fast_tanimoto<-function(mat,output='matrix',progress=TRUE){
mat[is.na(mat)]<-0
#scoring function
score<-function(x){sum(x==2)/sum(x>0)}
@dgrapov
dgrapov / plotly_select_DT.R
Last active Sep 10, 2020
ggplot2 to plotly to shiny to box/lasso select to DT
View plotly_select_DT.R
#plotly box or lasso select linked to
# DT data table
# using Wage data
# the out group: is sex:Male, region:Middle Atlantic +
library(ggplot2)
library(plotly)
library(dplyr)
library(ISLR)
@dgrapov
dgrapov / SOM example.R
Last active Mar 5, 2021
Self-organizing map (SOM) example in R
View SOM example.R
#SOM example using wines data set
library(kohonen)
data(wines)
set.seed(7)
#create SOM grid
sommap <- som(scale(wines), grid = somgrid(2, 2, "hexagonal"))
## use hierarchical clustering to cluster the codebook vectors
groups<-3
@dgrapov
dgrapov / example.R
Last active Sep 21, 2015
Convert adjacency (or other) matrix to edge list
View example.R
library(reshape2)
gen.mat.to.edge.list<-function(mat,symmetric=TRUE,diagonal=FALSE,text=FALSE){
#create edge list from matrix
# if symmetric duplicates are removed
mat<-as.matrix(mat)
id<-is.na(mat) # used to allow missing
mat[id]<-"nna"
if(symmetric){mat[lower.tri(mat)]<-"na"} # use to allow missing values
if(!diagonal){diag(mat)<-"na"}
@dgrapov
dgrapov / RECA_test.R
Created Aug 21, 2015
Testing RECA: Relevant Component Analysis for Supervised Distance Metric Learning
View RECA_test.R
#R code, testing RECA with the iris data
library(RECA)
#test data
data(iris)
x<-iris[,-5]
y<-iris$Species
#similar groups (species) in each chunk (n=3)
chunksvec<-as.numeric(y)
@dgrapov
dgrapov / app
Last active Aug 29, 2015
ggvis linked brushing bug
View app
library(shiny)
library(ggvis)
shinyApp(
ui =bootstrapPage(
actionButton("randomize", "Randomize"),
ggvisOutput("plot1"),
ggvisOutput("plot2"),
verbatimTextOutput("summary")
),