Skip to content

Instantly share code, notes, and snippets.

View pachevalier's full-sized avatar

Paul-Antoine pachevalier

  • Paris
View GitHub Profile
@pachevalier
pachevalier / gmm_vs_lm.R
Created July 30, 2013 10:48
This gist compare the results of the gmm() and the lm() function in R using simulated data.
library("gmm")
set.seed(1234567)
N <- 1000
dd <- data.frame(id = 1:N)
dd$u <- rnorm(N)
dd$x <- 1 + rnorm(N)
dd$y <- 1 + dd$x + dd$u
m1 <- lm(y ~ x, data = dd)
m2 <- gmm(y ~ x, x = ~ x, wmatrix = "ident", data = dd)
coefficients(m1)
mc <- lm(formula = SBR ~ Age, data = ch)
m1 <- lm(formula = SBR ~ Age, data = subset(ch, Sex == "M"))
m2 <- lm(formula = SBR ~ Age, data = subset(ch, Sex == "F"))
sc <- sum(mc$residuals^2)
s1 <- sum(m1$residuals^2)
s2 <- sum(m2$residuals^2)
k <- 2
# Test statistic
fstat <- (sc - (s1 + s2)) / k / (s1 + s2) * (length(mc$residuals) - 2*k)
fstat
library("strucchange")
library("lubridate")
set.seed(1234567)
N <- 60
df <- data.frame(id = 1:N)
df$date <- seq(as.Date("2013-07-01"), by = "day", along = df$id)
df$date2 <- difftime(df$date, ymd("2013-07-01"), units = "day")
df$date3 <- difftime(df$date, ymd("2013-08-01"), units = "day")
difftime(ymd("2013-08-01"), ymd("2013-07-01"), units = "day")
library("lubridate")
library("strucchange")
library("segmented")
set.seed(1234567)
N <- 60
df <- data.frame(id = 1:N)
df$date <- seq(as.Date("2013-07-01"), by = "day", along = df$id)
df$date2 <- difftime(df$date, ymd("2013-07-01"), units = "day")
df$date3 <- ifelse(df$date > as.Date("2013-08-01"), difftime(df$date, ymd("2013-08-01"), units = "day"), 0)
@pachevalier
pachevalier / fake_line_plot.R
Created October 29, 2013 16:09
Fake Line PLot
N <- 100
fk <- data.frame(id = 1:N)
fk$x <- 1 + rnorm(N)
fk$y1 <- 1 + fk$x + rnorm(N)
fk$y2 <- 2 + fk$x + rnorm(N)
ggplot(data = fk, aes(x = x, y = y1)) +
geom_line(color = "#339966", size = 2) +
geom_line(aes(y = y2), color = "#990000", size = 2) +
theme_tufte()
@pachevalier
pachevalier / function_categorie.R
Created January 24, 2014 14:39
Cette fonction crée des catégories à partir d'une variable continue.
categorie <- function(x, breaks) {
c <- length(breaks)
temp <- 0
temp[x <= breaks[1]] <- 1
for (k in 1:c-1) {
temp[x > breaks[k] & x <= breaks[k+1]] <- k+1
}
temp[x > breaks[c]] <- c + 1
return(temp)
}
@pachevalier
pachevalier / codeiso.csv
Last active January 4, 2016 19:09
Code ISO des pays au format CSV. Importé de http://en.wikipedia.org/wiki/ISO_3166-1_alpha-3
ISO country
ABW Aruba
AFG Afghanistan
AGO Angola
AIA Anguilla
ALA Åland Islands
ALB Albania
AND Andorra
ARE United Arab Emirates
ARG Argentina
@pachevalier
pachevalier / code_ioc.csv
Created January 28, 2014 13:32
Le comité international olympique a un code pays spécifique différent des code ISO usuels. Ce fichier de correspondance a été extrait du site du comité international olympique par une méthode de scraping.
code country
AFG Afghanistan
RSA South Africa
ALB Albania
ALG Algeria
GER Germany
AND Andorra
ANG Angola
ANT Antigua and Barbuda
AHO Netherlands Antilles
@pachevalier
pachevalier / arrange.R
Last active August 29, 2015 13:57
Problem with the output of arrange
library("dplyr")
set.seed(123)
N <- 100
df <- data.frame(id = 1:N, x = rnorm(N))
df$x[runif(N) < .1] <- NA
table(is.na(tdf$x))
tdf <- tbl_df(df)
out <- arrange(tdf, desc(x))
out2 <- tdf[order(tdf$x, decreasing = TRUE),]
library("ggplot2")
set.seed(1234)
N <- 100
df <- data.frame(i = 1:N, x = rnorm(N))
df$y <- 1 + df$x + rnorm(N)
df$z <- (runif(N) < .3)
pdf("output/test.pdf")
ggplot(data = df, aes(x = x, y = y, shape = z)) +
geom_point(size = 3) +