Skip to content

Instantly share code, notes, and snippets.

@rgbkrk
Last active December 2, 2016 01:26
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save rgbkrk/01a372c85682cb0d76192dbe6069baea to your computer and use it in GitHub Desktop.
Save rgbkrk/01a372c85682cb0d76192dbe6069baea to your computer and use it in GitHub Desktop.
sparklyr
> flights_tbl
Source: query [3.368e+05 x 19]
Database: spark connection master=yarn-client app=sparklyr local=FALSE
year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
<int> <int> <int> <int> <int> <dbl> <int> <int>
1 2013 1 1 517 515 2 830 819
2 2013 1 1 533 529 4 850 830
3 2013 1 1 542 540 2 923 850
4 2013 1 1 544 545 -1 1004 1022
5 2013 1 1 554 600 -6 812 837
6 2013 1 1 554 558 -4 740 728
7 2013 1 1 555 600 -5 913 854
8 2013 1 1 557 600 -3 709 723
9 2013 1 1 557 600 -3 838 846
10 2013 1 1 558 600 -2 753 745
# ... with 3.368e+05 more rows, and 11 more variables: arr_delay <dbl>,
# carrier <chr>, flight <int>, tailnum <chr>, origin <chr>, dest <chr>,
# air_time <dbl>, distance <dbl>, hour <dbl>, minute <dbl>, time_hour <dbl>
> nycflights13::flights
# A tibble: 336,776 <U+00D7> 19
year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
<int> <int> <int> <int> <int> <dbl> <int> <int>
1 2013 1 1 517 515 2 830 819
2 2013 1 1 533 529 4 850 830
3 2013 1 1 542 540 2 923 850
4 2013 1 1 544 545 -1 1004 1022
5 2013 1 1 554 600 -6 812 837
6 2013 1 1 554 558 -4 740 728
7 2013 1 1 555 600 -5 913 854
8 2013 1 1 557 600 -3 709 723
9 2013 1 1 557 600 -3 838 846
10 2013 1 1 558 600 -2 753 745
# ... with 336,766 more rows, and 11 more variables: arr_delay <dbl>,
# carrier <chr>, flight <int>, tailnum <chr>, origin <chr>, dest <chr>,
# air_time <dbl>, distance <dbl>, hour <dbl>, minute <dbl>, time_hour <dttm>
> devtools::install_github("rstudio/sparklyr")
...
> require(sparklyr)
Loading required package: sparklyr
> sc <- spark_connect(master = "yarn-client", version = "1.6.1")
> library(dplyr)
> src_tbls(sc)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment