Last active
March 23, 2022 02:29
-
-
Save benmarwick/9265414 to your computer and use it in GitHub Desktop.
Convert a folder of text files into a single CSV file with one column for the file names and one column of the text of the file. A function in R.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# test it by creating some small text files to run the function on | |
txt <- c("here is", "some text", "to test", "this function with", "'including a leading quote", '"and another leading quote') | |
# make text files | |
dir.create("testdir") | |
for(i in 1:length(txt)){ | |
writeLines(txt[i], paste0("testdir/outfile-", i, ".txt")) | |
} | |
# run the function and then look in the CSV file that is produced. | |
txt2csv("testdir", "theoutfile") |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#' Compiling several text files into a single CSV file | |
#' | |
#' Convert a folder of text files into a single CSV file | |
#' with one column for the file names and one column of the | |
#' text of the file. A function in R. | |
#' | |
#' To use this function for the first time run this next line: | |
#' install.packages("devtools") | |
#' then thereafter you just need to load the function | |
#' fom github like so, with these two lines: | |
#' library(devtools) # windows users need Rtools installed, mac users need XCode installed | |
#' source_url("https://gist.github.com/benmarwick/9265414/raw/text2csv.R") | |
#' | |
#' Here's how to set the arguments to the function: | |
#' | |
#' mydir is the full path of the folder that contains your txt files | |
#' for example "C:/Downloads/mytextfiles" Note that it must have | |
#' quote marks around it and forward slashes, which are not default | |
#' in windows. | |
#' | |
#' mycsvfilename is the name that you want your CSV file to | |
#' have, it must have quote marks around it, but not | |
#' the .csv bit at the end | |
#' | |
#' A full example, assuming you've sourced the | |
#' function from github already: | |
#' | |
#' txt2csv("C:/Downloads/mytextfiles", "mybigcsvfile") | |
#' | |
#' and after a moment you'll get a message in the R console | |
#' saying 'Your CSV file is called mybigcsvfile.csv and | |
#' can be found in C:/Downloads/mytextfiles' | |
txt2csv <- function(mydir, mycsvfilename){ | |
starting_dir <- getwd() | |
# Get the names of all the txt files (and only txt files) | |
myfiles <- list.files(mydir, full.names = TRUE, pattern = "*.txt") | |
# Read the actual contexts of the text files into R and rearrange a little. | |
# create a list of dataframes containing the text | |
mytxts <- lapply(myfiles, readLines) | |
# combine the rows of each dataframe to make one | |
# long character vector where each item in the vector | |
# is a single text file | |
mytxts1lines <- unlist(mytxts) | |
# make a dataframe with the file names and texts | |
mytxtsdf <- data.frame(filename = basename(myfiles), # just use filename as text identifier | |
fulltext = mytxts1lines) # full text character vectors in col 2 | |
# Now write them all into a single CSV file, one txt file per row | |
setwd(mydir) # make sure the CSV goes into the dir where the txt files are | |
# write the CSV file... | |
write.table(mytxtsdf, file = paste0(mycsvfilename, ".csv"), sep = ",", row.names = FALSE, col.names = FALSE, quote = FALSE) | |
# now check your folder to see the csv file | |
message(paste0("Your CSV file is called ", paste0(mycsvfilename, ".csv"), ' and can be found in ', getwd())) | |
# return original working directory | |
setwd(starting_dir) | |
} |
I have the same problem as Terelet.
Error in data.frame(filename = basename(myfiles), fulltext = mytxts1lines) :
arguments imply differing number of rows: 4, 529
do you have a solution for that?
thanks for your work
alessio
Thanks, but also I have the same problem as @alessiolevis :/
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Update: this worked when running the modified script in the elarkin fork. Extremely useful - Thank you both!