Skip to content

Instantly share code, notes, and snippets.

@john-sandall
Last active August 29, 2015 14:23
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save john-sandall/05c3abb24fc738ddc2ad to your computer and use it in GitHub Desktop.
Save john-sandall/05c3abb24fc738ddc2ad to your computer and use it in GitHub Desktop.
dplyr::mutate with lubridate::ymd_hms seems to crash R at random
# Posted on StackOverflow: http://stackoverflow.com/questions/31010713/combining-dplyrmutate-with-lubridateymd-hms-in-r-randomly-causes-segfault
#!/usr/bin/env R
# -*- encoding:utf-8 -*-
require(lubridate)
require(dplyr)
set.seed(42)
make_some_random_datetimes = function(n) ymd("2015-01-01") + seconds(runif(n, min=0, max=60*60*24*365))
d = data.frame(
col1 = make_some_random_datetimes(5000),
col2 = make_some_random_datetimes(5000)
)
do_it = function() {
d %>% mutate(
col1 = ymd_hms(col1),
col2 = ymd_hms(col2) # for some reason, it only crashes when evaluating 2+ cols, if we removed this line it'd be fine
)
return(TRUE)
}
do_it() # doesn't crash every time...it fails every nth time where n is randomly distributed with mean of roughly 7.7
do_it_lots_of_times = function(n) for (i in 1:n) do_it()
do_it_lots_of_times(50) # almost guaranteed to fail on my machine
# Running the above line in RStudio, I just get message "R Session Aborted. R encountered a fatal error. The session was terminated."
# Running the above line in Terminal, I get following
# *** caught segfault ***
# address 0x0, cause 'unknown'
# Output of sessionInfo()
# R version 3.2.1 (2015-06-18)
# Platform: x86_64-apple-darwin14.3.0 (64-bit)
# Running under: OS X 10.10.3 (Yosemite)
#
# locale:
# [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
#
# attached base packages:
# [1] stats graphics grDevices utils datasets methods base
#
# other attached packages:
# [1] dplyr_0.4.2 lubridate_1.3.3
#
# loaded via a namespace (and not attached):
# [1] lazyeval_0.1.10 R6_2.0.1 assertthat_0.1 magrittr_1.5 plyr_1.8.3 parallel_3.2.1
# [7] DBI_0.3.1 tools_3.2.1 memoise_0.2.1 Rcpp_0.11.6 stringi_0.5-2 digest_0.6.8
# [13] stringr_1.0.0
@stephlocke
Copy link

Try this version out in the interim

require(lubridate)
require(dplyr)
require(data.table)
set.seed(42)
make_some_random_datetimes = function(n) ymd("2015-01-01") + seconds(runif(n, min=0, max=60*60*24*365))

d = data.table(
  col1 = make_some_random_datetimes(5000),
  col2 = make_some_random_datetimes(5000)
)


do_it = function() {
  d %>% .[,`:=`(
    col1 = ymd_hms(col1),
    col2 = ymd_hms(col2)  # for some reason, it only crashes when evaluating 2+ cols, if we removed this line it'd be fine
  )]
  return(TRUE)
}

do_it()  # doesn't crash every time...it fails every nth time where n is randomly distributed with mean of roughly 7.7

do_it_lots_of_times = function(n) for (i in 1:n) do_it()

do_it_lots_of_times(50)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment