javierluraschi/using-tensorflow-in-emr-with-sparklyr.md

## using-tensorflow-in-emr-with-sparklyr.md

      
    Raw
  

              using-tensorflow-in-emr-with-sparklyr.md
            
          
    A script to demonstrate using TensorFlow in Spark with Amazon EMR and sparklyr.

Create an EMR cluster for sparklyr, connect to EMR and install required tools:

install.packages(tensorflow)
devtools::install_github("rstudio/tfdeploy")

Connect to Spark using sparklyr, copy some data and the mtcars TensorFlow model:

library(sparklyr)

sc <- spark_connect(
  master = "yarn-client",
  config = list(
    sparklyr.apply.env.WORKON_HOME = "/tmp/.virtualenvs",
    sparklyr.shell.files = "tfestimators-mtcars.tar"
  )
)

mtcars_tbl <- sdf_copy_to(sc, mtcars)

Install TensorFlow over each worker node (1 nodes in this example); alternatevely, one can install tensorflow while the cluster is being created

sdf_len(sc, 1, repartition = 1) %>% spark_apply(function(e) {
  tensorflow::install_tensorflow(extra_packages = c("protobuf==3.0.0b2"))
})

Perform a prediction in TensorFlow across the Spark cluster:

mtcars_tbl %>% spark_apply(function(df) {
  instances <- unname(apply(df, 1, function(e) 
    list(cyl = e[2], disp = e[3])
  ))
  
  results <- tfdeploy::predict_savedmodel(
    instances,
    "tfestimators-mtcars.tar",
    signature_name = "predict"
  )
  
  unname(unlist(results))
})
# Source:   table<sparklyr_tmp_7a8b27d1c8d5> [?? x 1]
# Database: spark_connection
     mpg
   <dbl>
 1  7.90
 2  7.90
 3  5.41
 4 12.1 
 5 16.7 
 6 10.7 
 7 16.7 
 8  7.05
 9  6.80
10  8.22
# ... with more rows