Skip to content

Instantly share code, notes, and snippets.

@emhart
Last active September 11, 2015 11:52
Show Gist options
  • Save emhart/0fc44169189e97a17f56 to your computer and use it in GitHub Desktop.
Save emhart/0fc44169189e97a17f56 to your computer and use it in GitHub Desktop.
Extract mentions from a tweet string using recursion
#' Extract mentions
#' @description extract all the mentions of other twitter users from a tweet
#' @param txt the text of your tweet
#' @export
mention_ext <- local({
ment <- rep(NA,150)
f <- function(txt){
if(!is.na(txt)){
n <- regmatches(txt,regexpr('@[a-zA-Z0-9_]{1,15}',txt))
txt <- unlist(regmatches(txt,regexpr('@[a-zA-Z0-9_]{1,15}',txt),invert=T))[2]
Recall(txt)
if(length(n) > 0){
ment[nchar(txt)+1] <<- n
ment[!is.na(ment)]
}
}
}
})
### Example from https://twitter.com/jaimedash/status/639872345075650560
tweet <- "@emhrt_ @_inundata @MSFTResearch looks cool -> @authorea Lots of good tools these days!"
m <- mention_ext(tweet)
### Be sure to do rm(m) after you're done otherwise it will just grow
### Maybe I should have used that loop...
mention_loop <- function(txt){
n <- vector()
while(!is.na(txt)){
n <- c(n,regmatches(txt,regexpr('@[a-zA-Z0-9_]{1,15}',txt)))
txt <- unlist(regmatches(txt,regexpr('@[a-zA-Z0-9_]{1,15}',txt),invert=T))[2]
}
return(n)
}
ml <- mention_loop(tweet)
@ashander
Copy link

The while way seems better! Tho I guess this a toy example, you could do
regmatches(tweet, gregexpr('@[a-zA-Z0-9_]{1,15}',tweet))

@hadley
Copy link

hadley commented Sep 11, 2015

Or stringr::str_extract_all(tweet, '@[a-zA-Z0-9_]{1,15}') ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment