Skip to content

Instantly share code, notes, and snippets.

@stephenturner
Created January 14, 2015 19:49
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save stephenturner/6cab882155efde2933e9 to your computer and use it in GitHub Desktop.
Save stephenturner/6cab882155efde2933e9 to your computer and use it in GitHub Desktop.
read.table vs fread
# Generate dataset with 5,000,000 rows, and some random numbers from normal,
# uniform, and cauchy distributions. Write out to file (warning, ~330MB)
n <- 5000000
d <- data.frame(a=1:n, b=rnorm(n), c=runif(n), d=rcauchy(n))
write.table(d, file="test.txt")
# Import the regular way with read.table
system.time(in1 <- read.table("test.txt"))
## Crikey!
# user system elapsed
# 132.164 0.786 132.917
# Import with fread (Fast and friendly file finagler) in data.table package
library(data.table)
system.time(in2 <- fread("test.txt"))
# Wow.
# Read 5000000 rows and 5 (of 5) columns from 0.334 GB file in 00:00:03
# user system elapsed
# 2.049 0.094 2.141
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment