Skip to content

Instantly share code, notes, and snippets.

@FrankRuns
Last active June 12, 2022 10:56
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save FrankRuns/a2d5bd1a70edf0f9054aba3d3694e776 to your computer and use it in GitHub Desktop.
Save FrankRuns/a2d5bd1a70edf0f9054aba3d3694e776 to your computer and use it in GitHub Desktop.
Very quick ggplot2 scatterplot visualization for selection bias article.
# purpose: visualize linear trend for all data and subset of data
# libraries
library(dplyr)
library(ggplot2)
# read data
d <- read.csv("mycsvfile.csv")
# quickl look
head(d)
# make is smaller
d <- d %>% select(traffic_std, stars_std, is_top_ten)
# correlations
cor(d$traffic_std, d$stars_std)
d %>% filter(is_top_ten == 1) %>%
summarize(cor(traffic_std, stars_std))
# visualize
ggplot(d, aes(x=traffic_std, y=stars_std)) +
geom_point(color="steel blue", alpha = 0.5) +
geom_smooth(method="lm", color="blue") +
geom_point(data = d %>% filter(is_top_ten == 1),
aes(x=traffic_std, y=stars_std),
color="pink", alpha=0.5) +
geom_smooth(data = d %>% filter(is_top_ten == 1),
aes(x=traffic_std, y=stars_std),
color="red", method="lm") +
labs(x="Traffic (standardized)",
y="Star Rating (standardized)",
title="The Top 10% (red) Ranked Restaurants... \nhave a Different Correlation than all (blue+red) Restaurants") +
theme_minimal()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment