Skip to content

Instantly share code, notes, and snippets.

@garyfeng
Created May 30, 2015 04:00
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save garyfeng/87b3d390d2afb946df2d to your computer and use it in GitHub Desktop.
Save garyfeng/87b3d390d2afb946df2d to your computer and use it in GitHub Desktop.
R function to extract a vector of substrings from a list or vector of strings, based on a regexpr
# A function that takes a vector or a list of strings and a regex pattern,
# and returns a vector of strings that matches the regex patterns. Don't
# have parameter checking, error handling, etc.
# The key lesson learned is that regmatches() is badly designed
# as it silently drops any non-matched elements. As a result, the length
# of the returned vector may nor may not be the same as the input.
# had to use the good-o substring() trick.
# The second lesson is that when the regex pattern contains several (), such
# as the example below, the order doesn't affect regexpr(pattern, x), but when
# doing gsub(pattern, "\\1", x), the \1 is NOT the first () that matched, but
# the first () in the pattern. As a result, I had to contatinate the \1\2,etc.
# which is really a hack. I don't know a better solution.
regexSubStrings <-function(x, pattern) {
#x<-unlist(x)
y<-regexpr(pattern, x)
str<-substring(unlist(x), y, y + attr(y, "match.length")-1)
# @@ can't use regmatches, because it does not return anything for non-matches
#str<-ifelse(y>0, regmatches(unlist(x), y), "")
#str<-regmatches(unlist(x), y)
str<-gsub(pattern, "\\1\\2\\3\\4\\5\\6", str)
return(str)
}
# in my data frame "trunk", trunk$extInfo is a vector of lists, with a single element in the list.
# unlist(lapply(test, regexSubStrings))
test<-list("this is control:URL_0", "id is bad", "@id=`link_123`+link_234")
pattern <-"control:([a-zA-Z0-9_]+)|@id=`([a-zA-Z0-9_]+)|event:(taskStart)"
recoveredEvents<-regexSubStrings(test, pattern)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment