Skip to content

Instantly share code, notes, and snippets.

@Tayflo
Last active February 20, 2021 20:16
Show Gist options
  • Save Tayflo/f3f8c589eb91bf39bbcad6fcfe00672f to your computer and use it in GitHub Desktop.
Save Tayflo/f3f8c589eb91bf39bbcad6fcfe00672f to your computer and use it in GitHub Desktop.
Some thoughts about Year zero and dealing with BCE dates in the Lubridate R package (v1.7.9)

Using lubridate with BCE/CE (as of v1.7.9)

Related issue: tidyverse/lubridate#2

Summary: If you are wondering about how to do deal with BCE dates in R, especially with the Lubridate R package (v1.7.9), or to implement those within the Lubridate package, here are some thoughts about problems caused by a phantom year zero.

Explanations

If anyone is considering dealing with "before common era" dates in lubridate, be aware that Year zero doesn't exist (for historians I mean, there is Year -1 and then Year 1, see for example the Wikipedia chronology – also note that's for the Julian calendar, that could have minor conflicts with the Gregorian calendar we use nowadays; you would find more details about this on Wikipedia), and that can cause a few problems.

Quick clarification for people unfamiliar with those notations:

  • "CE" stands for "Common Era", which is a "de-christianisation" of long and still used "AD", "Anno Domini" (so dates CE could be seen as "positive years")
  • "BCE" stands for "Before Common Era", equivalent to "BC", "Before Christ" ("negative years"). eg. Year 2021 (happy new year btw!) would be "2021 CE". Socrates died in 399 BCE.

For instance, lubridate follows the ISO 8601 (version 8601:2004 I presume? BCE dates could be handled with ISO 8601:2019 but the free-access part of the doc is unclear about it), which starts at 0000-01-01, that is the 1st January of 1 BCE (Year -1).

This writing is confusing because it leaves to think "0000-01-01" is Year 0, and that "-001-01-01" is Year -1 when it's Year -2, and can cause problems to compute durations (see code below).

That aside, if encountered, "0 CE/AD" or "0 BCE/BC" should probably be parsed into Year -1.

References: Wikipedia (ISO 8601, Year zero, 1 BC, Common Era...)

Some code to make my point

(Licensed under WTFPL: Do What The Fuck You Want to)

pacman::p_load(lubridate)
pacman::p_version(lubridate)
#> [1] '1.7.9'

a <- ymd("0001-01-01")
a
#> [1] "0001-01-01"
# Year 1, no problem

b <- ymd("0000-01-01") - years(1)
b
#> [1] "-001-01-01"
# It is Year -1?
# No, it's -2 even if printed (-001-01-01),
# since ymd("0000-01-01") is already Year -1.

# The problem appears if we compute duration between the two
as.duration(a - b)
#> [1] "63158400s (~2 years)"
# But there is only one year between 1st January -1 and 1st January 1!
# since year zero doesn't exist.

Let's illustrate with Augustus dates:

  • birth: 23 September 63 BCE
  • death: 19 August 14 CE
  • age at death: 75
aug_birth <- ymd("0000-09-23") - years(63)
aug_death <- ymd("0014-08-19")
age <- aug_death - aug_birth
as.duration(age)
#> [1] "2426889600s (~76.9 years)"
# That's one year too much!

# The correct writing would be:
aug_birth <- ymd("0000-09-23") - years(63 - 1)

So a correct helper function would be, to parse BCE yyyy-mm-dd:

parse_bce_ymd <- function(str) {
  regex <- "(\\d{4})(-\\d{2}-\\d{2})"
  match <- stringr::str_match(str, regex)
  years_n <- readr::parse_number(match[, 2]) - 1 # Beware the -1 here
  right_side <- match[, 3]
  date <- ymd(paste0("0000-",right_side)) - years(years_n)
  return(date)
}
# Test the function.
aug_birth <- parse_bce_ymd("0063-09-23")
aug_death <- ymd("0014-08-19")
age <- aug_death - aug_birth
as.duration(age)
#> [1] "2395353600s (~75.9 years)"
# Yay that's correct!

Still, lubridate print the BCE date with one year less (less in absolute value, that is one year ahead here) than the "real one", as if a zero-year existed, which is misleading.

aug_birth
#> [1] "-062-09-23"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment