-
-
Save statsccpr/5f4bb658c15ff2a31b3ba0c0afae228d to your computer and use it in GitHub Desktop.
rstata_boomerang_rnotebook
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
--- | |
title: "R Notebook & RStata as tools for Transparent Data Analysis if you insist on Stata" | |
output: | |
html_notebook: default | |
html_document: default | |
word_document: default | |
--- | |
# R Markdown Intro | |
This is a [R Markdown](http://rmarkdown.rstudio.com) Notebook. When you execute R code within the notebook, the results appear beneath the code. | |
Try executing this chunk by clicking the *Run* button within the chunk or by placing your cursor inside it and pressing *Ctrl+Shift+Enter*. | |
Add a new chunk by clicking the *Insert Chunk* button on the toolbar or by pressing *Ctrl+Alt+I*. | |
When you save the notebook, an HTML file containing the code and output will be saved alongside it (click the *Preview* button or press *Ctrl+Shift+K* to preview the HTML file). | |
# Rstudio + Rstata + Stata | |
The above is a default paragraph that comes with Rstudio. | |
We want to use the Rstata package to boomerang R string- encapsulated Stata commands to a local copy of Stata. | |
Therefore, you'll need: | |
1. Rstudio "version 1.0 or higher" for the markdown notebooks | |
http://rmarkdown.rstudio.com/r_notebooks.html | |
2. The 'Rstata' package | |
https://github.com/lbraglia/RStata | |
3. Stata installed locally | |
http://www.stata.com/ | |
```{r} | |
library(RStata) | |
options("RStata.StataVersion" = 14) | |
# chooseStataBin() | |
stata_path = "\"C:\\Program Files (x86)\\Stata14\\StataSE-64\"" | |
stata_path | |
``` | |
# R to call Stata | |
With the 'RStata' package, You can send Stata commands from within R. | |
https://github.com/lbraglia/RStata | |
https://github.com/EconometricsBySimulation/RStata/wiki/Dictionary:-Stata-to-R | |
## So What? | |
You can definately use Stata by itself, but do you want to use the horrendous Stata script editor? | |
* no Stata syntax highlighting even though you paid for Stata | |
* forced with using bare-bones comments ` * and // ` instead of writer-friendly markdown narratives | |
The two above stata limitations hinder 'easy-to-use' transparency protocols. | |
## So... Rmarkdown! | |
* Rstudio - the IDE development engine | |
* Rmarkdown - the file format that integrates R code with markdown text | |
* R - the software that evaluates R code | |
* RStata - the R package that sends Rside commands to Stata (vice versa) | |
* Stata - the software that you or your boss paid for | |
With Rstudio + Rmarkdown + R + RStata integrated so nicely, you can have the best of 3 worlds | |
* Write narratives that are human-readable | |
* Manipulate data with human-readable R code | |
* 'paid-for-assurance' of Stata analysis commands | |
note: R is free, people pay for Stata 'assurance' with implicit belief that Stata routines are 'right' | |
# Example | |
Lets see the `R:::RStata::stata()` boomerang in action | |
## Recall: Transparent Data Analysis Workflow, Steps 1-3 in R | |
See corresponding slides, https://ucla.box.com/shared/static/13pqxwmxbdy31v3z4o9b15gzjgu8wu5z.pdf | |
* Stata cannot deal with '.' dots in variable names. | |
* So we convert to generic y,x1,x2 variable names. | |
* Sidenote, R can deal with dots. | |
* R excels for data wrangling steps. | |
```{r} | |
head(iris) | |
``` | |
```{r} | |
suppressMessages(library(dplyr)) | |
dat_r = iris %>% | |
select(Sepal.Length,Sepal.Width,Petal.Length) %>% | |
rename(y=Sepal.Length, | |
x1=Sepal.Width, | |
x2=Petal.Length | |
) | |
head(dat_r) | |
``` | |
## Step 4: Analysis "in" Stata | |
If you were to use Stata by itself to run a regression of y on x1 and x2, you would use the stata command | |
`reg y x1 x2` | |
Below is the R string that contains the above pure stata command | |
```{r} | |
r_string_stata_command = ' | |
reg y x1 x2 | |
' | |
``` | |
Below is the R side command using the `RStata::stata()` function to send the previous stata-command-string to your personal computer's copy of the Stata software | |
```{r} | |
RStata::stata(r_string_stata_command, | |
data.in = dat_r, | |
stata.path=stata_path | |
) | |
``` | |
# Conclusion | |
So the RStata package basically tells R to send commands to Stata, temporarily captures the Stata log result, and brings it back into R. | |
With those Stata results, we write markdown text which is a joy for humans to write, and when rendered, a joy for humans to read. | |
Rstudio will compile the .Rmd file and output rendered html, word, or pdf files. | |
You can do each of the above sub-processes yourself, but why not let tools help you do it automatically? | |
All the important content is in one single '.Rmd' source file that can be version controlled with your favorite flavor of git. | |
# Bonus: [R]est Assured | |
let's compare R results with the Stata results | |
```{r} | |
lm(data=dat_r,y~x1+x2) %>% summary | |
``` | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment