Skip to content

Instantly share code, notes, and snippets.

@vjo
Last active August 29, 2015 14:26
Show Gist options
  • Save vjo/23e560470104dfe48c71 to your computer and use it in GitHub Desktop.
Save vjo/23e560470104dfe48c71 to your computer and use it in GitHub Desktop.
Issue Date in R

I'm parsing a file, converting the date in a Date type and adding 2 columns based on the date.

  data <- read.csv(csvFile, na.strings = "")

  ## Convert date column as Date type
  data[["date"]] <- as.POSIXct(data[["date"]])
  
  ## Add a time and day column
  data[["time"]] <- strftime(data[["date"]], format="%H:%M")
  data[["day"]] <- strftime(data[["date"]], format="%Y-%m-%d")
  
  print(head(data))

And this is what I get. Note That for a sample ~2000 lines, it's blazing fast and seems to work. For my complete file (~1M lines) it's quite slow and it doesn't to work.

> source('~/Development/git/BasisDataAnalysis/basisData.R')
> basisData("sample.csv")
                 date calories     gsr heart.rate skin.temp steps  time        day
1 2013-07-28 00:00:00    1.206 3.40587         57   93.2000     0 00:00 2013-07-28
2 2013-07-28 00:01:00    1.200 3.61320         53   93.2000     0 00:01 2013-07-28
3 2013-07-28 00:02:00    1.200 3.60855         55   93.2000     0 00:02 2013-07-28
4 2013-07-28 00:03:00    1.381 4.38401         57   93.2375     0 00:03 2013-07-28
5 2013-07-28 00:04:00    1.264 4.07134         55   93.2000     0 00:04 2013-07-28
6 2013-07-28 00:05:00    1.200 3.29479         55   93.0125     0 00:05 2013-07-28
> basisData("bodymetrics.csv")
        date calories     gsr heart.rate skin.temp steps  time        day
1 2013-07-28    1.206 3.40587         57   93.2000     0 00:00 2013-07-28
2 2013-07-28    1.200 3.61320         53   93.2000     0 00:00 2013-07-28
3 2013-07-28    1.200 3.60855         55   93.2000     0 00:00 2013-07-28
4 2013-07-28    1.381 4.38401         57   93.2375     0 00:00 2013-07-28
5 2013-07-28    1.264 4.07134         55   93.2000     0 00:00 2013-07-28
6 2013-07-28    1.200 3.29479         55   93.0125     0 00:00 2013-07-28

@alung suggested me to use dplyr:

library(dplyr)

data <- data %>% mutate(date = as.POSIXct(date), day = strftime(date, format="%Y-%m-%d"), time = strftime(date, format="%H:%M"))

But it still fail:

> basisData("sample.csv")
                 date calories     gsr heart.rate skin.temp steps        day  time
1 2013-07-28 00:00:00    1.206 3.40587         57   93.2000     0 2013-07-28 00:00
2 2013-07-28 00:01:00    1.200 3.61320         53   93.2000     0 2013-07-28 00:01
3 2013-07-28 00:02:00    1.200 3.60855         55   93.2000     0 2013-07-28 00:02
4 2013-07-28 00:03:00    1.381 4.38401         57   93.2375     0 2013-07-28 00:03
5 2013-07-28 00:04:00    1.264 4.07134         55   93.2000     0 2013-07-28 00:04
6 2013-07-28 00:05:00    1.200 3.29479         55   93.0125     0 2013-07-28 00:05
> basisData("bodymetrics.csv")
        date calories     gsr heart.rate skin.temp steps day  time
1 2013-07-28    1.206 3.40587         57   93.2000     0 BST 00:00
2 2013-07-28    1.200 3.61320         53   93.2000     0 BST 00:00
3 2013-07-28    1.200 3.60855         55   93.2000     0 BST 00:00
4 2013-07-28    1.381 4.38401         57   93.2375     0 BST 00:00
5 2013-07-28    1.264 4.07134         55   93.2000     0 BST 00:00
6 2013-07-28    1.200 3.29479         55   93.0125     0 BST 00:00

Update

This was not working:

data <- data %>% mutate(fulldate = as.POSIXct(date))

But this seems to work well:

data <- data %>% mutate(fulldate = as.POSIXct(date, "%Y-%m-%d %H:%M", tz="UTC"))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment