Skip to content

Instantly share code, notes, and snippets.

@javierluraschi
Last active January 20, 2020 16:53
Show Gist options
  • Save javierluraschi/aaf0ef91fcd9e368478bae7e4c883e85 to your computer and use it in GitHub Desktop.
Save javierluraschi/aaf0ef91fcd9e368478bae7e4c883e85 to your computer and use it in GitHub Desktop.
Using TensorFlow in EMR with sparklyr

A script to demonstrate using TensorFlow in Spark with Amazon EMR and sparklyr.

  1. Create an EMR cluster for sparklyr, connect to EMR and install required tools:
install.packages(tensorflow)
devtools::install_github("rstudio/tfdeploy")
  1. Connect to Spark using sparklyr, copy some data and the mtcars TensorFlow model:
library(sparklyr)

sc <- spark_connect(
  master = "yarn-client",
  config = list(
    sparklyr.apply.env.WORKON_HOME = "/tmp/.virtualenvs",
    sparklyr.shell.files = "tfestimators-mtcars.tar"
  )
)

mtcars_tbl <- sdf_copy_to(sc, mtcars)
  1. Install TensorFlow over each worker node (1 nodes in this example); alternatevely, one can install tensorflow while the cluster is being created
sdf_len(sc, 1, repartition = 1) %>% spark_apply(function(e) {
  tensorflow::install_tensorflow(extra_packages = c("protobuf==3.0.0b2"))
})
  1. Perform a prediction in TensorFlow across the Spark cluster:
mtcars_tbl %>% spark_apply(function(df) {
  instances <- unname(apply(df, 1, function(e) 
    list(cyl = e[2], disp = e[3])
  ))
  
  results <- tfdeploy::predict_savedmodel(
    instances,
    "tfestimators-mtcars.tar",
    signature_name = "predict"
  )
  
  unname(unlist(results))
})
# Source:   table<sparklyr_tmp_7a8b27d1c8d5> [?? x 1]
# Database: spark_connection
     mpg
   <dbl>
 1  7.90
 2  7.90
 3  5.41
 4 12.1 
 5 16.7 
 6 10.7 
 7 16.7 
 8  7.05
 9  6.80
10  8.22
# ... with more rows
@harryprince
Copy link

harryprince commented Feb 12, 2019

installing Tensorflow in the spark_apply session seems too heavy, which will slow down the prediction process.

Directly copy all environment dependencies to each worker may be a better option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment