Skip to content

Instantly share code, notes, and snippets.

@piccolbo
Last active June 23, 2018 03:58
Show Gist options
  • Save piccolbo/3d8ac40291f4eaee644b to your computer and use it in GitHub Desktop.
Save piccolbo/3d8ac40291f4eaee644b to your computer and use it in GitHub Desktop.
Dplyr backends: the ultimate collection

Dplyr is a well known R package to work on structured data, either in memory or in DB and, more recently, in cluster. The in memory implementations have in general capabilities that are not found in the others, so the notion of backend is used with a bit of a poetic license. Even the different DB and cluster backends differ in subtle ways. But it sure is better than writing SQL directly! Here I provide a list of backends with links to the packages that implement them when necessary. I've done my best to provide links to active projects, but I am not endorsing any of them. Do your own testing. Enjoy and please contribute any corrections or additions, in the comments.

Backend Package
data.frame builtin
data.table builtin
arrays builtin
SQLite builtin
PostgreSQL/Redshift builtin
MySQL/MariaDB builtin
Bigquery bigrquery
MonetDB MonetDB.R
Presto RPresto
Spark dplyr.spark.hive
Hive dplyr.spark.hive
Impala dplyrimpaladb
Vertica vertica.dplyr
Teradata teradata.dplyr
Calcite dplyr-calcite
@nassimhaddad
Copy link

nassimhaddad commented Aug 4, 2016

@hrbrmstr
Copy link

hrbrmstr commented Jan 9, 2017

@ianmcook
Copy link

New dplyr backend for Apache Impala: implyr

@himanshusin
Copy link

There are some packages that let you refer and manipulate data directly in Teradata .
Try :: https://github.com/hoxo-m/dplyr.teradata. Its still beta , I guess.
It is dplyR wrapper for Teradata and allows lazy execution.
But , I didn't find as robust as dplyr source support for inbulit databases , and Teradata is not one of them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment