Skip to content

Instantly share code, notes, and snippets.

@stephlocke
Last active April 4, 2016 10:07
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save stephlocke/dcb5f0dad688b69afb712f7c55824eea to your computer and use it in GitHub Desktop.
Save stephlocke/dcb5f0dad688b69afb712f7c55824eea to your computer and use it in GitHub Desktop.
key setting in data.table
---
title: "data.table keys"
author: "Steph Locke"
date: "4 April 2016"
output:
md_document:
variant: markdown_github
---
# Glossary
- **KEY**
- **COMPOSITE KEY**
# Key setting
You can set keys on data.tables to facilitate joins, improve querying speed, and to sort your data. You can set a key as you create a data.table with `data.table()`, and you can also set keys with dedicated functions, chiefly `setkey()` and `set2key()`.
The iris dataset will be used throughout.
```{r}
library(data.table)
head(setDT(copy(iris)))
```
## `data.table()`
You can create keys as you create data.tables.
### `data.table()`
When you make a data.table object via `data.table()` there is an argument `key=`. `key=` allows you to set a key as you produce a data.table - this will perform sorting like `setkey()` would.
```{r}
irisDT<-data.table(iris, key="Sepal.Width")
head(irisDT)
```
### `setDT()`
Alternatively, the fast setting of a data.frame to data.table function `setDT()` also has a `key=` argument.
```{r}
irisDT<-setDT(copy(iris), key="Sepal.Width")
head(irisDT)
```
## `setkey()`
`setkey()` assigns a key and performs physical sorting on the table.
```{r}
irisDT<-setDT(copy(iris))
setkey(irisDT,Sepal.Length)
head(irisDT)
```
It's possible to make a composite key:
```{r}
irisDT<-setDT(copy(iris))
setkey(irisDT, Sepal.Length, Sepal.Width)
head(irisDT)
```
The `setkey()` function takes named arguments but sometimes you may want to dynamically pass in column names. For this you can use the "v" variant `setkeyv()`:
```{r}
irisDT<-setDT(copy(iris))
key<-c("Sepal.Width","Sepal.Length")
setkeyv(irisDT,key )
head(irisDT)
```
## `set2key()`
`set2key()` assigns a key and **does not** perform physical sorting on the table.
```{r}
irisDT<-setDT(copy(iris))
set2key(irisDT,Sepal.Length)
head(irisDT)
```
It's possible to make a composite key:
```{r}
irisDT<-setDT(copy(iris))
set2key(irisDT, Sepal.Length, Sepal.Width)
head(irisDT)
```
The `set2key()` function takes named arguments but sometimes you may want to dynamically pass in column names. For this you can use the "v" variant `set2keyv()`:
```{r}
irisDT<-setDT(copy(iris))
key<-c("Sepal.Width","Sepal.Length")
set2keyv(irisDT,key )
head(irisDT)
```
-----
[Rmd file](https://gist.github.com/stephlocke/dcb5f0dad688b69afb712f7c55824eea)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment