Skip to content

Instantly share code, notes, and snippets.

View hadley's full-sized avatar

Hadley Wickham hadley

View GitHub Profile
@hadley
hadley / shiny-oauth.r
Last active April 26, 2024 05:41
Sketch of shiny + oauth
library(shiny)
library(httr)
# OAuth setup --------------------------------------------------------
# Most OAuth applications require that you redirect to a fixed and known
# set of URLs. Many only allow you to redirect to a single URL: if this
# is the case for, you'll need to create an app for testing with a localhost
# url, and an app for your deployed app.
# What's the most natural way to express this code in base R?
library(dplyr, warn.conflicts = FALSE)
mtcars %>%
group_by(cyl) %>%
summarise(mean = mean(disp), n = n())
#> # A tibble: 3 x 3
#> cyl mean n
#> <dbl> <dbl> <int>
#> 1 4 105. 11
#> 2 6 183. 7
@hadley
hadley / .gitignore
Last active February 25, 2024 02:10
Benchmark different ways of reading a file
.Rproj.user
.Rhistory
.RData
*.Rproj
*.html
data(diamonds, package = "ggplot2")
# Most straightforward
diamonds$ppc <- diamonds$price / diamonds$carat
# Avoid repeating diamonds
diamonds$ppc <- with(diamonds, price / carat)
# The inspiration for dplyr's mutate
diamonds <- transform(diamonds, ppc = price / carat)
@hadley
hadley / ds-training.md
Created March 13, 2015 18:49
My advise on what you need to do to become a data scientist...

If you were to give recommendations to your "little brother/sister" on things that they need to do to become a data scientist, what would those things be?

I think the "Data Science Venn Diagram" (http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram) is a great place to start. You need three things to be a good data scientist:

  • Statistical knowledge
  • Programming/hacking skills
  • Domain expertise

Statistical knowledge

@hadley
hadley / reproducible.md
Created January 6, 2010 17:33
How to write a reproducible example

How to write a reproducible example.

You are most likely to get good help with your R problem if you provide a reproducible example. A reproducible example allows someone else to recreate your problem by just copying and pasting R code.

There are four things you need to include to make your example reproducible: required packages, data, code, and a description of your R environment.

  • Packages should be loaded at the top of the script, so it's easy to see which ones the example needs.

  • The easiest way to include data in an email is to use dput() to generate

msg <- function(..., prob = 0.25) {
if (runif(1) > prob) {
return(invisible())
}
messages <- c(...)
message(sample(messages, 1))
}
encourage <- function() {
library(ggplot2)
x <- c("بقرة", "دجاج", "حصان")
df <- data.frame(x = x, y = 1:3)
labels_rtl <- function(x) paste0("\u202B", x)
ggplot(df, aes(x, y)) +
geom_point() +
scale_x_discrete(labels = labels_rtl) +
@hadley
hadley / advise.md
Created February 13, 2015 21:32
Advise for teaching an R workshop

I think the two most important messages that people can get from a short course are:

a) the material is important and worthwhile to learn (even if it's challenging), and b) it's possible to learn it!

For those reasons, I usually start by diving as quickly as possible into visualisation. I think it's a bad idea to start by explicitly teaching programming concepts (like data structures), because the pay off isn't obvious. If you start with visualisation, the pay off is really obvious and people are more motivated to push past any initial teething problems. In stat405, I used to start with some very basic templates that got people up and running with scatterplots and histograms - they wouldn't necessary understand the code, but they'd know which bits could be varied for different effects.

Apart from visualisation, I think the two most important topics to cover are tidy data (i.e. http://www.jstatsoft.org/v59/i10/ + tidyr) and data manipulation (dplyr). These are both important for when people go off and apply

library(tidyverse)
# https://twitter.com/buddyherms/status/1576966150680121344 --------------
# PROs: at, by, and regexp examples
# CONs: quite simple
vt_census <- tidycensus::get_decennial(
geography = "block",
state = "VT",