Skip to content

Instantly share code, notes, and snippets.

@statsccpr
Last active November 29, 2016 04:02
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save statsccpr/5f4bb658c15ff2a31b3ba0c0afae228d to your computer and use it in GitHub Desktop.
Save statsccpr/5f4bb658c15ff2a31b3ba0c0afae228d to your computer and use it in GitHub Desktop.
rstata_boomerang_rnotebook
---
title: "R Notebook & RStata as tools for Transparent Data Analysis if you insist on Stata"
output:
html_notebook: default
html_document: default
word_document: default
---
# R Markdown Intro
This is a [R Markdown](http://rmarkdown.rstudio.com) Notebook. When you execute R code within the notebook, the results appear beneath the code.
Try executing this chunk by clicking the *Run* button within the chunk or by placing your cursor inside it and pressing *Ctrl+Shift+Enter*.
Add a new chunk by clicking the *Insert Chunk* button on the toolbar or by pressing *Ctrl+Alt+I*.
When you save the notebook, an HTML file containing the code and output will be saved alongside it (click the *Preview* button or press *Ctrl+Shift+K* to preview the HTML file).
# Rstudio + Rstata + Stata
The above is a default paragraph that comes with Rstudio.
We want to use the Rstata package to boomerang R string- encapsulated Stata commands to a local copy of Stata.
Therefore, you'll need:
1. Rstudio "version 1.0 or higher" for the markdown notebooks
http://rmarkdown.rstudio.com/r_notebooks.html
2. The 'Rstata' package
https://github.com/lbraglia/RStata
3. Stata installed locally
http://www.stata.com/
```{r}
library(RStata)
options("RStata.StataVersion" = 14)
# chooseStataBin()
stata_path = "\"C:\\Program Files (x86)\\Stata14\\StataSE-64\""
stata_path
```
# R to call Stata
With the 'RStata' package, You can send Stata commands from within R.
https://github.com/lbraglia/RStata
https://github.com/EconometricsBySimulation/RStata/wiki/Dictionary:-Stata-to-R
## So What?
You can definately use Stata by itself, but do you want to use the horrendous Stata script editor?
* no Stata syntax highlighting even though you paid for Stata
* forced with using bare-bones comments ` * and // ` instead of writer-friendly markdown narratives
The two above stata limitations hinder 'easy-to-use' transparency protocols.
## So... Rmarkdown!
* Rstudio - the IDE development engine
* Rmarkdown - the file format that integrates R code with markdown text
* R - the software that evaluates R code
* RStata - the R package that sends Rside commands to Stata (vice versa)
* Stata - the software that you or your boss paid for
With Rstudio + Rmarkdown + R + RStata integrated so nicely, you can have the best of 3 worlds
* Write narratives that are human-readable
* Manipulate data with human-readable R code
* 'paid-for-assurance' of Stata analysis commands
note: R is free, people pay for Stata 'assurance' with implicit belief that Stata routines are 'right'
# Example
Lets see the `R:::RStata::stata()` boomerang in action
## Recall: Transparent Data Analysis Workflow, Steps 1-3 in R
See corresponding slides, https://ucla.box.com/shared/static/13pqxwmxbdy31v3z4o9b15gzjgu8wu5z.pdf
* Stata cannot deal with '.' dots in variable names.
* So we convert to generic y,x1,x2 variable names.
* Sidenote, R can deal with dots.
* R excels for data wrangling steps.
```{r}
head(iris)
```
```{r}
suppressMessages(library(dplyr))
dat_r = iris %>%
select(Sepal.Length,Sepal.Width,Petal.Length) %>%
rename(y=Sepal.Length,
x1=Sepal.Width,
x2=Petal.Length
)
head(dat_r)
```
## Step 4: Analysis "in" Stata
If you were to use Stata by itself to run a regression of y on x1 and x2, you would use the stata command
`reg y x1 x2`
Below is the R string that contains the above pure stata command
```{r}
r_string_stata_command = '
reg y x1 x2
'
```
Below is the R side command using the `RStata::stata()` function to send the previous stata-command-string to your personal computer's copy of the Stata software
```{r}
RStata::stata(r_string_stata_command,
data.in = dat_r,
stata.path=stata_path
)
```
# Conclusion
So the RStata package basically tells R to send commands to Stata, temporarily captures the Stata log result, and brings it back into R.
With those Stata results, we write markdown text which is a joy for humans to write, and when rendered, a joy for humans to read.
Rstudio will compile the .Rmd file and output rendered html, word, or pdf files.
You can do each of the above sub-processes yourself, but why not let tools help you do it automatically?
All the important content is in one single '.Rmd' source file that can be version controlled with your favorite flavor of git.
# Bonus: [R]est Assured
let's compare R results with the Stata results
```{r}
lm(data=dat_r,y~x1+x2) %>% summary
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment