Skip to content

Instantly share code, notes, and snippets.

View gdequeiroz's full-sized avatar

Gabriela de Queiroz gdequeiroz

View GitHub Profile
@jhollist
jhollist / get_nars.R
Last active January 26, 2017 18:12
R script to scrape US EPA NARS data and metadata
#Script to scrape NARS website
library(rvest)
library(dplyr)
nars <- read_html("https://www.epa.gov/national-aquatic-resource-surveys/data-national-aquatic-resource-surveys")
links <- nars %>%
html_nodes(".file-link, a") %>%
html_attr("href") %>%
tbl_df() %>%
filter(grepl("files",value)) %>%
@jhofman
jhofman / dplyr_filter_ungroup.R
Created January 20, 2016 16:45
careful when filtering with many groups in dplyr
library(dplyr)
# create a dummy dataframe with 100,000 groups and 1,000,000 rows
# and partition by group_id
df <- data.frame(group_id=sample(1:1e5, 1e6, replace=T),
val=sample(1:100, 1e6, replace=T)) %>%
group_by(group_id)
# filter rows with a value of 1 naively
system.time(df %>% filter(val == 1))
@hadley
hadley / ds-training.md
Created March 13, 2015 18:49
My advise on what you need to do to become a data scientist...

If you were to give recommendations to your "little brother/sister" on things that they need to do to become a data scientist, what would those things be?

I think the "Data Science Venn Diagram" (http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram) is a great place to start. You need three things to be a good data scientist:

  • Statistical knowledge
  • Programming/hacking skills
  • Domain expertise

Statistical knowledge

@renkun-ken
renkun-ken / r-supportive-members
Last active August 29, 2015 14:06
Where do R's supportive members mainly come from?
library(pipeR) # https://github.com/renkun-ken/pipeR
library(rlist) # https://github.com/renkun-ken/rlist
library(rvest) # https://github.com/hadley/rvest
library(stringr) # https://github.com/hadley/stringr
# please ensure rvest is the latest dev version
Pipe("http://www.r-project.org/foundation/memberlist.html")$
html()$ # use xpath to scrape the name list
html_nodes(xpath = "//table[2]//td//text() | //table[3]//td//text()")$
html_text(trim = TRUE)$
library(httr)
library(XML)
library(selectr)
xpath <- function(x) structure(x, class = "xpath")
sel <- function(x) xpath(css_to_xpath(x, prefix = "//"))
url <- "http://www.boxofficemojo.com/movies/?id=ateam.htm"
html <- content(GET(url), "parsed")
@hadley
hadley / s3.r
Created May 7, 2013 13:16
Implementation of request signing for Amazon's S3 in R.
library(httr)
library(digest)
library(XML)
s3_request <- function(verb, bucket, path = "/", query = NULL,
content = NULL, date = NULL) {
list(
verb = verb,
bucket = bucket,
path = path,
@mbostock
mbostock / .block
Last active December 11, 2024 17:30 — forked from mbostock/.block
Bar Chart
license: gpl-3.0
redirect: https://beta.observablehq.com/@mbostock/d3-bar-chart
@bwhite
bwhite / rank_metrics.py
Created September 15, 2012 03:23
Ranking Metrics
"""Information Retrieval metrics
Useful Resources:
http://www.cs.utexas.edu/~mooney/ir-course/slides/Evaluation.ppt
http://www.nii.ac.jp/TechReports/05-014E.pdf
http://www.stanford.edu/class/cs276/handouts/EvaluationNew-handout-6-per.pdf
http://hal.archives-ouvertes.fr/docs/00/72/67/60/PDF/07-busa-fekete.pdf
Learning to Rank for Information Retrieval (Tie-Yan Liu)
"""
import numpy as np
@jexchan
jexchan / multiple_ssh_setting.md
Created April 10, 2012 15:00
Multiple SSH keys for different github accounts

Multiple SSH Keys settings for different github account

create different public key

create different ssh key according the article Mac Set-Up Git

$ ssh-keygen -t rsa -C "your_email@youremail.com"