Created
May 30, 2015 04:00
-
-
Save garyfeng/87b3d390d2afb946df2d to your computer and use it in GitHub Desktop.
R function to extract a vector of substrings from a list or vector of strings, based on a regexpr
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# A function that takes a vector or a list of strings and a regex pattern, | |
# and returns a vector of strings that matches the regex patterns. Don't | |
# have parameter checking, error handling, etc. | |
# The key lesson learned is that regmatches() is badly designed | |
# as it silently drops any non-matched elements. As a result, the length | |
# of the returned vector may nor may not be the same as the input. | |
# had to use the good-o substring() trick. | |
# The second lesson is that when the regex pattern contains several (), such | |
# as the example below, the order doesn't affect regexpr(pattern, x), but when | |
# doing gsub(pattern, "\\1", x), the \1 is NOT the first () that matched, but | |
# the first () in the pattern. As a result, I had to contatinate the \1\2,etc. | |
# which is really a hack. I don't know a better solution. | |
regexSubStrings <-function(x, pattern) { | |
#x<-unlist(x) | |
y<-regexpr(pattern, x) | |
str<-substring(unlist(x), y, y + attr(y, "match.length")-1) | |
# @@ can't use regmatches, because it does not return anything for non-matches | |
#str<-ifelse(y>0, regmatches(unlist(x), y), "") | |
#str<-regmatches(unlist(x), y) | |
str<-gsub(pattern, "\\1\\2\\3\\4\\5\\6", str) | |
return(str) | |
} | |
# in my data frame "trunk", trunk$extInfo is a vector of lists, with a single element in the list. | |
# unlist(lapply(test, regexSubStrings)) | |
test<-list("this is control:URL_0", "id is bad", "@id=`link_123`+link_234") | |
pattern <-"control:([a-zA-Z0-9_]+)|@id=`([a-zA-Z0-9_]+)|event:(taskStart)" | |
recoveredEvents<-regexSubStrings(test, pattern) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment