Last active
December 17, 2015 23:08
-
-
Save codegordi/5686866 to your computer and use it in GitHub Desktop.
Use multicore package (R on MacOS) to grep a (Very Large Data) file-as-dataframe.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
### manage memory on a large data set using multicore library | |
library(multicore) | |
# read in tab-delimited file from working dir | |
df = read.table(getwd(), sep="\t", header=T) | |
d.lines = as.character(df$charvar) # $charvar is character-class variable you want to grep | |
grep_wrap <- function (pattern, x, ignore.case = FALSE, perl = FALSE, value = FALSE, fixed = FALSE, useBytes = FALSE, invert = FALSE) { | |
ret = rep(0, length(x)) | |
ret[grep(pattern, x, ignore.case, perl, value, fixed, useBytes, invert)] = 1 | |
ret | |
} | |
result = pvec(d.lines, grep_wrap, pattern="\\d{7,10}\\>") # example pattern finds 7-10 digits at the end of input string |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment