Create a gist now

Instantly share code, notes, and snippets.

What would you like to do?
rstata_boomerang_rnotebook
title output
R Notebook & RStata as tools for Transparent Data Analysis if you insist on Stata
html_notebook html_document word_document
default
default
default

R Markdown Intro

This is a R Markdown Notebook. When you execute R code within the notebook, the results appear beneath the code.

Try executing this chunk by clicking the Run button within the chunk or by placing your cursor inside it and pressing Ctrl+Shift+Enter.

Add a new chunk by clicking the Insert Chunk button on the toolbar or by pressing Ctrl+Alt+I.

When you save the notebook, an HTML file containing the code and output will be saved alongside it (click the Preview button or press Ctrl+Shift+K to preview the HTML file).

Rstudio + Rstata + Stata

The above is a default paragraph that comes with Rstudio.

We want to use the Rstata package to boomerang R string- encapsulated Stata commands to a local copy of Stata.

Therefore, you'll need:

  1. Rstudio "version 1.0 or higher" for the markdown notebooks http://rmarkdown.rstudio.com/r_notebooks.html

  2. The 'Rstata' package https://github.com/lbraglia/RStata

  3. Stata installed locally http://www.stata.com/

library(RStata)
options("RStata.StataVersion" = 14)
# chooseStataBin()
stata_path = "\"C:\\Program Files (x86)\\Stata14\\StataSE-64\""

stata_path

R to call Stata

With the 'RStata' package, You can send Stata commands from within R.

https://github.com/lbraglia/RStata

https://github.com/EconometricsBySimulation/RStata/wiki/Dictionary:-Stata-to-R

So What?

You can definately use Stata by itself, but do you want to use the horrendous Stata script editor?

  • no Stata syntax highlighting even though you paid for Stata
  • forced with using bare-bones comments * and // instead of writer-friendly markdown narratives

The two above stata limitations hinder 'easy-to-use' transparency protocols.

So... Rmarkdown!

  • Rstudio - the IDE development engine
  • Rmarkdown - the file format that integrates R code with markdown text
  • R - the software that evaluates R code
  • RStata - the R package that sends Rside commands to Stata (vice versa)
  • Stata - the software that you or your boss paid for

With Rstudio + Rmarkdown + R + RStata integrated so nicely, you can have the best of 3 worlds

  • Write narratives that are human-readable
  • Manipulate data with human-readable R code
  • 'paid-for-assurance' of Stata analysis commands

note: R is free, people pay for Stata 'assurance' with implicit belief that Stata routines are 'right'

Example

Lets see the R:::RStata::stata() boomerang in action

Recall: Transparent Data Analysis Workflow, Steps 1-3 in R

See corresponding slides, https://ucla.box.com/shared/static/13pqxwmxbdy31v3z4o9b15gzjgu8wu5z.pdf

  • Stata cannot deal with '.' dots in variable names.

    • So we convert to generic y,x1,x2 variable names.
    • Sidenote, R can deal with dots.
  • R excels for data wrangling steps.

head(iris)
suppressMessages(library(dplyr))

dat_r = iris %>% 
  select(Sepal.Length,Sepal.Width,Petal.Length) %>% 
  rename(y=Sepal.Length,
         x1=Sepal.Width,
         x2=Petal.Length
         )

head(dat_r)

Step 4: Analysis "in" Stata

If you were to use Stata by itself to run a regression of y on x1 and x2, you would use the stata command

reg y x1 x2

Below is the R string that contains the above pure stata command

r_string_stata_command = '
reg y x1 x2	
'

Below is the R side command using the RStata::stata() function to send the previous stata-command-string to your personal computer's copy of the Stata software

RStata::stata(r_string_stata_command,
      data.in = dat_r,
      stata.path=stata_path
      )

Conclusion

So the RStata package basically tells R to send commands to Stata, temporarily captures the Stata log result, and brings it back into R.

With those Stata results, we write markdown text which is a joy for humans to write, and when rendered, a joy for humans to read.

Rstudio will compile the .Rmd file and output rendered html, word, or pdf files.

You can do each of the above sub-processes yourself, but why not let tools help you do it automatically?

All the important content is in one single '.Rmd' source file that can be version controlled with your favorite flavor of git.

Bonus: [R]est Assured

let's compare R results with the Stata results

lm(data=dat_r,y~x1+x2) %>% summary
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment