R Notebook & RStata as tools for Transparent Data Analysis if you insist on Stata
R Markdown Intro
This is a R Markdown Notebook. When you execute R code within the notebook, the results appear beneath the code.
Try executing this chunk by clicking the Run button within the chunk or by placing your cursor inside it and pressing Ctrl+Shift+Enter.
Add a new chunk by clicking the Insert Chunk button on the toolbar or by pressing Ctrl+Alt+I.
When you save the notebook, an HTML file containing the code and output will be saved alongside it (click the Preview button or press Ctrl+Shift+K to preview the HTML file).
Rstudio + Rstata + Stata
The above is a default paragraph that comes with Rstudio.
We want to use the Rstata package to boomerang R string- encapsulated Stata commands to a local copy of Stata.
Therefore, you'll need:
Rstudio "version 1.0 or higher" for the markdown notebooks http://rmarkdown.rstudio.com/r_notebooks.html
The 'Rstata' package https://github.com/lbraglia/RStata
Stata installed locally http://www.stata.com/
library(RStata) options("RStata.StataVersion" = 14) # chooseStataBin() stata_path = "\"C:\\Program Files (x86)\\Stata14\\StataSE-64\"" stata_path
R to call Stata
With the 'RStata' package, You can send Stata commands from within R.
You can definately use Stata by itself, but do you want to use the horrendous Stata script editor?
- no Stata syntax highlighting even though you paid for Stata
- forced with using bare-bones comments
* and //instead of writer-friendly markdown narratives
The two above stata limitations hinder 'easy-to-use' transparency protocols.
- Rstudio - the IDE development engine
- Rmarkdown - the file format that integrates R code with markdown text
- R - the software that evaluates R code
- RStata - the R package that sends Rside commands to Stata (vice versa)
- Stata - the software that you or your boss paid for
With Rstudio + Rmarkdown + R + RStata integrated so nicely, you can have the best of 3 worlds
- Write narratives that are human-readable
- Manipulate data with human-readable R code
- 'paid-for-assurance' of Stata analysis commands
note: R is free, people pay for Stata 'assurance' with implicit belief that Stata routines are 'right'
Lets see the
R:::RStata::stata() boomerang in action
Recall: Transparent Data Analysis Workflow, Steps 1-3 in R
See corresponding slides, https://ucla.box.com/shared/static/13pqxwmxbdy31v3z4o9b15gzjgu8wu5z.pdf
Stata cannot deal with '.' dots in variable names.
- So we convert to generic y,x1,x2 variable names.
- Sidenote, R can deal with dots.
R excels for data wrangling steps.
suppressMessages(library(dplyr)) dat_r = iris %>% select(Sepal.Length,Sepal.Width,Petal.Length) %>% rename(y=Sepal.Length, x1=Sepal.Width, x2=Petal.Length ) head(dat_r)
Step 4: Analysis "in" Stata
If you were to use Stata by itself to run a regression of y on x1 and x2, you would use the stata command
reg y x1 x2
Below is the R string that contains the above pure stata command
r_string_stata_command = ' reg y x1 x2 '
Below is the R side command using the
RStata::stata() function to send the previous stata-command-string to your personal computer's copy of the Stata software
RStata::stata(r_string_stata_command, data.in = dat_r, stata.path=stata_path )
So the RStata package basically tells R to send commands to Stata, temporarily captures the Stata log result, and brings it back into R.
With those Stata results, we write markdown text which is a joy for humans to write, and when rendered, a joy for humans to read.
Rstudio will compile the .Rmd file and output rendered html, word, or pdf files.
You can do each of the above sub-processes yourself, but why not let tools help you do it automatically?
All the important content is in one single '.Rmd' source file that can be version controlled with your favorite flavor of git.
Bonus: [R]est Assured
let's compare R results with the Stata results
lm(data=dat_r,y~x1+x2) %>% summary