Skip to content

Instantly share code, notes, and snippets.

View rer145's full-sized avatar

Ron Richardson rer145

View GitHub Profile
@rer145
rer145 / gutenberg_download.R
Created November 6, 2017 23:28
How to download the text of a novel from Project Gutenberg with R
library(gutenbergr)
library(stringr)
# search for the EXACT name of the novel title
gutenberg_works(title=='Dracula')
# search for the EXACT author's name (Last Name, First Name)
gutenberg_works(author=='Stoker, Bram')
# search for the word 'Frankenstein' in the title column
@rer145
rer145 / unnest_tokens.R
Created November 6, 2017 23:32
How to split a line of text into individual words
library(dplyr)
library(tidytext)
library(janeaustenr)
# Using dplyr and janeaustenr, get the contents of 'Sense & Sensibility'
sns<-austen_books()
sns<-sns%>%
filter(book=='Sense & Sensibility')
head(sns)
@rer145
rer145 / wordcloud.R
Created November 6, 2017 23:34
How to make a simple wordcloud with R
library(dplyr)
library(tidytext)
library(janeaustenr)
# Using dplyr and janeaustenr, get the contents of 'Sense & Sensibility'
sns<-austen_books()
sns<-sns%>%
filter(book=='Sense & Sensibility')
head(sns)
@rer145
rer145 / ggplot_bar.R
Created November 6, 2017 23:37
How to create a simple bar chart with ggplot2 in R
library(dplyr)
library(Lahman)
library(ggplot2)
# Get all the teams from 1980 and how many home runs they hit
df<-Teams %>%
filter(yearID==1980) %>%
select(name, HR) %>%
arrange(HR)
@rer145
rer145 / ggplot_interactive.R
Created November 6, 2017 23:42
How to create interactive plots with ggplot2 and R
library(Lahman)
library(dplyr)
library(ggplot2)
library(ggiraph)
# Get all teams from 1980 and the number of home runs hit
df<-Teams %>%
filter(yearID == 1980) %>%
select(name, HR) %>%
arrange(HR)
@rer145
rer145 / read_csv.R
Last active November 7, 2017 00:00
How to read CSV data with R
library(dplyr)
library(stringr)
deaths<-read.csv("z_KoreanConflict.csv", header=TRUE, stringsAsFactors=FALSE)
@rer145
rer145 / ggplot_lubridate.R
Last active November 7, 2017 00:01
How to use lubridate to plot date/time data in R
library(dplyr)
library(ggplot2)
library(stringr)
library(lubridate)
# First grab the data and filter out the bad data in INCIDENT_DATE (see https://gist.github.com/rer145/68b75131d7e2d89adc53b1f8d75ab294)
deaths<-read.csv("KoreanConflict.csv", header=TRUE, stringsAsFactors=FALSE)
regEx = "^\\d{8}$"
@rer145
rer145 / csv_regex_filter.R
Last active November 7, 2017 00:01
How to filter out bad CSV data with regular expressions in R
library(dplyr)
library(stringr)
deaths<-read.csv("z_KoreanConflict.csv", header=TRUE, stringsAsFactors=FALSE)
# We want to make sure all data in INCIDENT_DATE is correct
# A quick investigation shows the data in a YYYYMMDD format, but some fields are empty
# A simple regular expression can just check for 8 digits
regEx = "^\\d{8}$"
@rer145
rer145 / sentiments.R
Created November 7, 2017 00:09
How to match word sentiments to words with R
library(dplyr)
library(tidytext)
library(gutenbergr)
library(ggplot2)
# Get our data, the text of Frankenstein
dracula<-gutenberg_download(345)
# Remove the gutenberg_id field since we don't need it
dracula$gutenberg_id<-NULL
@rer145
rer145 / file_prompts.py
Created November 7, 2017 00:11
Input and output file prompts in python
import os.path
def input_file_prompt():
""" () -> string
Precondition: none
This function returns back a filename that the user wants to read data from. It also checks to see
if the file already exists and if not, informs the users and prompts them to try again.