Skip to content

Instantly share code, notes, and snippets.

@nassimhaddad
Forked from piccolbo/dplyr-backends.md
Last active August 4, 2016 14:01
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save nassimhaddad/bf8f42b85511c6718e5143faaaa35d45 to your computer and use it in GitHub Desktop.
Save nassimhaddad/bf8f42b85511c6718e5143faaaa35d45 to your computer and use it in GitHub Desktop.
Dplyr backends: the ultimate collection

Dplyr is a well known R package to work on structured data, either in memory or in DB and, more recently, in cluster. The in memory implementations have in general capabilities that are not found in the others, so the notion of backend is used with a bit of a poetic license. Even the different DB and cluster backends differ in subtle ways. But it sure is better than writing SQL directly! Here I provide a list of backends with links to the packages that implement them when necessary. I've done my best to provide links to active projects, but I am not endorsing any of them. Do your own testing. Enjoy and please contribute any corrections or additions, in the comments.

Backend Package
data.frame builtin
data.table builtin
arrays builtin
SQLite builtin
PostgreSQL/Redshift builtin
MySQL/MariaDB builtin
Bigquery bigrquery
MonetDB MonetDB.R
Presto RPresto
Spark dplyr.spark.hive
Hive dplyr.spark.hive
Impala dplyrimpaladb
Vertica vertica.dplyr
Teradata teradata.dplyr
Calcite dplyr-calcite
SQL Server RSQLServer
Netezza dplyrnz
multidplyr multidplyr
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment