Skip to content

Instantly share code, notes, and snippets.

@ProQuestionAsker
Created January 11, 2017 03:39
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ProQuestionAsker/97f4569aafdaae1230fddc5bc819d273 to your computer and use it in GitHub Desktop.
Save ProQuestionAsker/97f4569aafdaae1230fddc5bc819d273 to your computer and use it in GitHub Desktop.
Parsing Transcript
# Installing Necessary Packages
# For Web Scraping Transcripts
library(rvest)
library(curl)
# For Data Frame Manipulation
library(dplyr)
library(tidyr)
library(stringr)
library(stringi)
# Import Transcript (with formatting)
RO <- readLines("RogueOneTranscript.txt")
# Convert to Data Frame
RO <- as.data.frame(RO)
# Remove empty rows
RO <- RO %>%
filter(!(RO == ""))
# Separating Character from words
RO_full <- RO %>%
separate(col = RO, into = c("Character", "Words"), sep = ":", extra = "merge") %>%
# Eliminate script notes
filter(!is.na(Words)) %>%
# Trim white space and convert Character to factor
mutate(Character = as.factor(str_trim(Character)),
Words = str_trim(Words))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment