Skip to content

Instantly share code, notes, and snippets.

@cfljam
Forked from mdavy86/eBridaIdentifiers.md
Last active August 29, 2015 14:16
Show Gist options
  • Save cfljam/78ccb92d0bc061929890 to your computer and use it in GitHub Desktop.
Save cfljam/78ccb92d0bc061929890 to your computer and use it in GitHub Desktop.
R examples for Regex to Clean Vine Identifiers

Problem

Incorrectly formatting of Vine identifiers from Excel sources for eBrida. For example, the identifier

Incorrect Excel format conversion Correct eBrida format
TO8.33.06.15F => T08.33-06-15f

Covention

(Site initial)(Year)(.)(Block)(-)(Row)(-)(Bay)(Position within Bay)

Loading example IDs

x <- scan(what=character(0))
TO8.33.02.01A
TO8.33.02.01B
TO8.33.02.02A
TO8.33.02.02B
""

Regular expression to fix

First approach

formatIDs <- function(x){
  ## Convert TO\\ds => T0\\ds
  x <- gsub("^(T)O(\\d)", "\\10\\2", x)
  ## -'s and lower case
  x <- gsub("(.+?\\.)(.+?)\\.(.+?)\\.(.+)","\\1\\2-\\3-\\L\\4", x, perl=TRUE)
return(x)
}

cat("[ MARC IDs ]\n")
print(x)

cat("[ eBrida conversion ]\n")
print(formatIDs(x))

Second approach using strsplit()

formatIDs <- function(x) {
  ## Using strsplit
  parts <- strsplit(x, "\\.")
  ## Convert TO\\ds => T0\\ds
  st  <- gsub("^(T)O(\\d)", "\\10\\2", parts[[1]])
  en <- tolower(parts[[4]])
  return(paste0(st, ".", parts[[2]], "-", parts[[3]], "-", en))
}

cat("[ MARC IDs ]\n")
print(x)

cat("[ eBrida conversion ]\n")
sapply(idparts, formatIDs)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment