Skip to content

Instantly share code, notes, and snippets.

@hannesdatta
Created September 27, 2020 11:53
Show Gist options
  • Save hannesdatta/18938cd1ccbd029c9af2c9f8b862170e to your computer and use it in GitHub Desktop.
Save hannesdatta/18938cd1ccbd029c9af2c9f8b862170e to your computer and use it in GitHub Desktop.
Fast conversion of `character` data columns to Date using data.table
# Quick conversion of `character` date columns to Date format using data.table
# fread(..., colClasses = c(date='Date')) is slow for large data sets, especially when
# the number of unique dates is small, but the number of cross-sectional units is large.
# The intuition of this algorithm is to only convert the UNIQUE dates to dates using as.Date,
# and then merging them back to the original data.table.
library(data.table)
data.table.date <- function(dt, datecol) {
classes=lapply(dt, class)
oldclass = classes[which(colnames(dt)==datecol)]
cat(paste0('Converting column `', datecol, '` (which is of type `', oldclass, '`) to date...\n'))
cat('collecting unique dates...\n')
uniqdates= data.table(date_str=unique(unlist(dt[, datecol, with=F])))
uniqdates[, date:=as.Date(date_str)]
setkey(uniqdates, date_str)
setkeyv(dt, datecol)
cat('merging converted dates back to original data table\n')
dt[uniqdates, date_new:=i.date]
dt[, (datecol):=NULL]
setnames(dt, 'date_new', datecol)
classes=lapply(dt, class)
newclass = classes[which(colnames(dt)==datecol)]
cat(paste0('Done. `', datecol, '` now is of type `', newclass, '`.\n'))
}
# Generate some data with a data character column
test_dt = data.table(date=as.character(seq(from=as.Date('2020-01-01'), to = as.Date('2020-04-01'), by = '1 day')))
test_dt[, random_number := runif(.N)]
lapply(test_dt, class)
# class is character
# convert
data.table.date(test_dt, 'date')
lapply(test_dt, class)
# class is date now!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment