Skip to content

Instantly share code, notes, and snippets.

@codegordi
Last active December 17, 2015 23:08
Show Gist options
  • Save codegordi/5686866 to your computer and use it in GitHub Desktop.
Save codegordi/5686866 to your computer and use it in GitHub Desktop.
Use multicore package (R on MacOS) to grep a (Very Large Data) file-as-dataframe.
### manage memory on a large data set using multicore library
library(multicore)
# read in tab-delimited file from working dir
df = read.table(getwd(), sep="\t", header=T)
d.lines = as.character(df$charvar) # $charvar is character-class variable you want to grep
grep_wrap <- function (pattern, x, ignore.case = FALSE, perl = FALSE, value = FALSE, fixed = FALSE, useBytes = FALSE, invert = FALSE) {
ret = rep(0, length(x))
ret[grep(pattern, x, ignore.case, perl, value, fixed, useBytes, invert)] = 1
ret
}
result = pvec(d.lines, grep_wrap, pattern="\\d{7,10}\\>") # example pattern finds 7-10 digits at the end of input string
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment