Instantly share code, notes, and snippets.

Embed
What would you like to do?
Dplyr backends: the ultimate collection

Dplyr is a well known R package to work on structured data, either in memory or in DB and, more recently, in cluster. The in memory implementations have in general capabilities that are not found in the others, so the notion of backend is used with a bit of a poetic license. Even the different DB and cluster backends differ in subtle ways. But it sure is better than writing SQL directly! Here I provide a list of backends with links to the packages that implement them when necessary. I've done my best to provide links to active projects, but I am not endorsing any of them. Do your own testing. Enjoy and please contribute any corrections or additions, in the comments.

Backend Package
data.frame builtin
data.table builtin
arrays builtin
SQLite builtin
PostgreSQL/Redshift builtin
MySQL/MariaDB builtin
Bigquery bigrquery
MonetDB MonetDB.R
Presto RPresto
Spark dplyr.spark.hive
Hive dplyr.spark.hive
Impala dplyrimpaladb
Vertica vertica.dplyr
Teradata teradata.dplyr
Calcite dplyr-calcite
@nassimhaddad

This comment has been minimized.

nassimhaddad commented Aug 4, 2016

@hrbrmstr

This comment has been minimized.

hrbrmstr commented Jan 9, 2017

@ianmcook

This comment has been minimized.

ianmcook commented Mar 31, 2017

New dplyr backend for Apache Impala: implyr

@himanshusin

This comment has been minimized.

himanshusin commented May 5, 2017

There are some packages that let you refer and manipulate data directly in Teradata .
Try :: https://github.com/hoxo-m/dplyr.teradata. Its still beta , I guess.
It is dplyR wrapper for Teradata and allows lazy execution.
But , I didn't find as robust as dplyr source support for inbulit databases , and Teradata is not one of them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment