Skip to content

Instantly share code, notes, and snippets.

@timriffe
Created October 14, 2011 16:48
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save timriffe/1287649 to your computer and use it in GitHub Desktop.
Save timriffe/1287649 to your computer and use it in GitHub Desktop.
A function to search a local MIT classics database (as set up by the function SearchMITclassics.R) for words and phrases
SearchMITclassics <-
function(dirpath="E:\\DATA\\CLASSICS",wordorphrase,linespm=2){
writers <- list.files(dirpath)
writers <- select.list(writers,title="Select Writers",multiple=TRUE,preselect=writers)
output <- list()
indicesalso <- c(1:linespm,-1:-linespm) # for grabbing nearby lines
for (i in 1:length(writers)){
writeri <- list()
worksi <- list.files(paste(dirpath,writers[i],sep="\\"))
if (length(writers)==1){
if (length(worksi)>1){
worksi <- select.list(worksi,
title="Select Works",multiple=TRUE,preselect=worksi)
}
}
pathsi <- paste(dirpath,writers[i],worksi,sep="\\")
worksi <- unlist(lapply(strsplit(worksi,split="\\."),function(x){x[1]}))
for (j in 1:length(worksi)){
Textij <- readLines(pathsi[j],warn=F)
midindices <- grep(wordorphrase,Textij,ignore.case = TRUE)
if (length(midindices)>0){
gm <- grep("THE END",Textij,ignore.case = TRUE)
maxg <- ifelse(any(gm),max(gm),length(Textij))
midindices <- midindices[midindices < maxg &
midindices > min(grep("by",Textij,ignore.case = TRUE))]
if (length(midindices)>0){
kk <- list()
for (k in 1:length(midindices)){
indicesk <- unique(sort(c(midindices[k],midindices[k]+indicesalso)))
kk[[k]] <- Textij[indicesk]
}
writeri[[worksi[j]]] <- kk
}
}
}
if (length(writeri)>0){
output[[writers[i]]] <- writeri
}
}
gc()
return(output)
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment