Skip to content

Instantly share code, notes, and snippets.

@earino
earino / process_blog.R
Created March 20, 2021 17:47
a pipeline to process the corporate blog
library(tidyverse)
library(rvest)
library(tidytext)
library(topicmodels)
library(ggplot2)
library(dplyr)
library(tidyr)
CACHE_DIR = "./cache3"
if (! file.exists(CACHE_DIR)) { dir.create(CACHE_DIR) }
1. Explain in your words what the unnest_token function does
2. Explain your words what the gutenbergr package does
3. Explain in your words how sentiment lexicon work
4. How does inner_join provide sentiment analysis functionality
5. Explain in your words what tf-idf does
library(twitteR)
library(tidyverse)
library(tidytext)
setup_twitter_oauth(
consumer_key = Sys.getenv("TWITTER_CONSUMER_KEY"),
consumer_secret = Sys.getenv("TWITTER_CONSUMER_SECRET"),
access_token = Sys.getenv("TWITTER_ACCESS_TOKEN"),
access_secret = Sys.getenv("TWITTER_ACCESS_SECRET")
1. In your own words describe LDA
2. In your own words, describe the process of a full tidy text analysis
3. Do a short tidy text analysis where you extract topics, explain why they are good or bad.
1. Explain in your words what the unnest_token function does
2. Explain your words what the gutenbergr package does
3. Explain in your words how sentiment lexicon work
4. How does inner_join provide sentiment analysis functionality
5. Explain in your words what tf-idf does
library(tidytext)
library(tidyverse)
# poem from http://www.hungarianreference.com/Poems/Szabo-Lorinc-Szeretlek.aspx
hungarian_poem <- c("Szeretlek, szeretlek, szeretlek,",
"egész nap kutatlak, kereslek,",
"egész nap sírok a testedért,",
"szomorú kedves a kedvesért,",
"egész nap csókolom testedet,",
"csókolom minden percedet.",
# From the blog post on the Weinstein Effect
# https://www.gokhanciflikli.com/post/weinstein-effect/
library(GuardianR)
library(stringr)
library(tidyverse)
library(tidytext)
library(lubridate)
library(rvest)
library(ggplot2)
Dplyr Questions
1. Which of the following return a subset of the columns of a data frame ?
a) select
b) retrieve
c) get
d) all of the mentioned
2. Point out the correct statement :
a) The data frame is a key data structure in statistics and in R
library(twitteR)
library(tidyverse)
library(tidytext)
setup_twitter_oauth(
consumer_key = Sys.getenv("TWITTER_CONSUMER_KEY"),
consumer_secret = Sys.getenv("TWITTER_CONSUMER_SECRET"),
access_token = Sys.getenv("TWITTER_ACCESS_TOKEN"),
access_secret = Sys.getenv("TWITTER_ACCESS_SECRET")
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta charset="utf-8" />
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta name="generator" content="pandoc" />