Skip to content

Instantly share code, notes, and snippets.

View dmarcelinobr's full-sized avatar
💭
I may be slow to respond

Daniel Marcelino dmarcelinobr

💭
I may be slow to respond
View GitHub Profile
@dmarcelinobr
dmarcelinobr / UnTag.R
Created May 30, 2013 15:17
remove content inside html tags from string variables
UnTag <- function(x){ gsub("<[^>]*>", " ", x) }
if(!require('SciencePo')) install.packages('SciencePo')
data(griliches76, package="SciencePo")
detail(griliches76)
summary(lm(lw~s+age, data=griliches76))
Call:
lm(formula = lw ~ s + iq, data = griliches76)
Residuals:
Min 1Q Median 3Q Max
-1.2817 -0.2436 0.0009 0.2424 1.1050
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.15573 0.10810 38.44 < 2e-16
@dmarcelinobr
dmarcelinobr / myboot.R
Last active December 18, 2015 14:59
Annotated version of myboot
myboot<-function(data, stat, nreps, hist = TRUE) {
# Where 'data' is the original data set to be paste into 'myboot' program.
# 'stat' is the function that will generate the desired statistic such as "standard errors", "confidence intervals" etc.
# 'nreps' is the number of repetitions we want in the simulation.
estimates<-get(stat)(data)# Compute the number of estimates needed:
len<-length(estimates)
# Make a container object matrix for store the bootstrap results:
container<-matrix(NA, ncol = len , nrow = nreps)
# The length of "estimates" is the number of coefficients we will estimate standard errors for.
nobs<-nrow(data)#Compute the number of observations to resample
mod1 <- function(griliches76)lm(lw~s+iq, data=griliches76)[[1]]
mod1.sds <- myboot(griliches76, "mod1", 10000, hist=TRUE)
(Intercept) s iq
estimates 4.1557 0.084696 0.003810
sds 0.1123 0.007351 0.001159
# rm(list = ls()) # clear objects
# graphics.off() # close graphics windows
plot.new() # call new plot window
x = seq(-5,5, length=250)
y = dnorm(x)
plot(x,y, las=1, ylab='dnorm', type='n', yaxs='i', ylim=c(0, 0.5))
x2 = seq(qnorm(0.95), 5, length=50)
y2 = dnorm(x2)
polygon(c(x2[1], x2, x2[length(x2)]), c(0, y2, 0), border=NA, col='grey')
lines(x, y)
myboot<-function(data, stat, nreps, hist = TRUE) {
estimates<-get(stat)(data)
len<-length(estimates)
container<-matrix(NA, ncol = len , nrow = nreps)
nobs<-nrow(data)
for(i in 1:nreps) {
posdraws<-ceiling(runif(nobs)*nobs)
resample<-data[posdraws,]
container[i,]<-get(stat)(resample)
}
@dmarcelinobr
dmarcelinobr / looping_files.R
Created January 2, 2014 22:00
This piece loops text files, read them, merge them, and write back a file.
path = "~/Documents/My Data/BRAZIL/Elections/"
out.file<-""
file.names <- dir(path, pattern =".txt")
for(i in 1:length(file.names)){
file <- read.table(file.names[i],header=TRUE, sep=";", stringsAsFactors=FALSE)
out.file <- rbind(out.file, file)
}
write.table(out.file, file = "cand_Brazil.txt",sep=";",
row.names = FALSE, qmethod = "double",fileEncoding="windows-1252")
@dmarcelinobr
dmarcelinobr / stata_import.r
Created January 6, 2014 00:28
time to import a text delimited file (458 MB)
. timer on 1
. import delimited "/Users/dmarcelino/Documents/My Data/BRAZIL/Eleicoes_new/cand_Brazil.txt", delimite r(";") varnames(1) case(preserve)
(39 vars, 1248456 obs)
. timer off 1
. timer list 1
1: 134.37 / 1 = 134.3660
@dmarcelinobr
dmarcelinobr / R_import.r
Created January 6, 2014 00:30
time to import a text delimited file (458 MB)
R> system.time(cand_Brazil<-read.csv2(file="cand_Brazil.txt", header=TRUE))
user system elapsed
98.540 1.752 102.490