Dplyr is a well known R package to work on structured data, either in memory or in DB and, more recently, in cluster. The in memory implementations have in general capabilities that are not found in the others, so the notion of backend is used with a bit of a poetic license. Even the different DB and cluster backends differ in subtle ways. But it sure is better than writing SQL directly! Here I provide a list of backends with links to the packages that implement them when necessary. I've done my best to provide links to active projects, but I am not endorsing any of them. Do your own testing. Enjoy and please contribute any corrections or additions, in the comments.
Backend | Package |
---|---|
data.frame | builtin |
data.table | builtin |
arrays | builtin |
SQLite | builtin |
PostgreSQL/Redshift | builtin |
MySQL/MariaDB | builtin |
Bigquery | bigrquery |
MonetDB | MonetDB.R |
Presto | RPresto |
Spark | dplyr.spark.hive |
Hive | dplyr.spark.hive |
Impala | dplyrimpaladb |
Vertica | vertica.dplyr |
Teradata | teradata.dplyr |
Calcite | dplyr-calcite |
also:
Netezza - https://github.com/philippechataignon/dplyrnz
Microsoft SQL Server - https://github.com/imanuelcostigan/RSQLServer
multidplyr - https://github.com/hadley/multidplyr