Skip to content

Instantly share code, notes, and snippets.

@rer145
Created November 6, 2017 23:32
Show Gist options
  • Save rer145/88b59057623b0170596dc7b5562d0523 to your computer and use it in GitHub Desktop.
Save rer145/88b59057623b0170596dc7b5562d0523 to your computer and use it in GitHub Desktop.
How to split a line of text into individual words
library(dplyr)
library(tidytext)
library(janeaustenr)
# Using dplyr and janeaustenr, get the contents of 'Sense & Sensibility'
sns<-austen_books()
sns<-sns%>%
filter(book=='Sense & Sensibility')
head(sns)
# tidytext has a function called unnest_tokens to split text into words
# Here we create a new dataframe with a column 'word' made up from the 'text' column in sns
words<-sns%>%
unnest_tokens(word, text)
head(words)
# We can filter out common words (aka "stop words") using a dataframe from tidytext
words<-words%>%
filter(!(word %in% stop_words$word))
head(words)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment