Created
May 25, 2020 07:44
-
-
Save njtierney/193e6ea77254f557970d223358812ca7 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
--- | |
class: bg-main1 | |
# Let's get more data | |
.huge[ | |
We'll use the `genius` package to get song lyric data from [Genius](https://genius.com/). | |
- `genius_album()` allows you to download the lyrics for an entire album in a | |
tidy format. | |
] | |
--- | |
class: bg-main1 | |
# getting more data | |
.huge[ | |
- Input: Two arguments: `artists` and `album`. If it gives you issues check that you have the album name and | |
artists as specified on [Genius](https://genius.com/). | |
- Output: A tidy data frame with three columns: | |
- `title`: track name | |
- `track_n`: track number | |
- `text`: lyrics | |
] | |
--- | |
class: bg-main1 | |
# [Greatest Australian Album of all time (as voted by triple J)](https://www.abc.net.au/triplej/hottest100/alltime/11/countdown/cd_1.htm) | |
```{r show-powderfinger-website, echo = FALSE, out.width = "90%"} | |
include_graphics("images/powderfinger.png") | |
``` | |
--- | |
class: bg-main1 | |
# Greatest Australian Album of all time (as voted by triple J) | |
```{r powderfinger-album, cache=TRUE} | |
od_num_five <- genius_album( | |
artist = "Powderfinger", | |
album = "Odyssey Number Five" | |
) | |
od_num_five | |
``` | |
--- | |
class: bg-main1 | |
# Save for later | |
```{r save-powderfinger} | |
powderfinger <- od_num_five %>% | |
mutate( | |
artist = "Powderfinger", | |
album = "Odyssey Number Five" | |
) | |
powderfinger | |
``` | |
--- | |
class: bg-main1 | |
# What songs are in the album? | |
```{r distinct-songs} | |
powderfinger %>% distinct(track_title) | |
``` | |
--- | |
class: bg-main1 | |
# How long are the lyrics in Powderfinger's songs? | |
```{r powderfinger-n-lines} | |
powderfinger %>% | |
count(track_title) %>% | |
arrange(-n) | |
``` | |
--- | |
class: bg-main1 | |
# Tidy up the lyrics! | |
```{r unnest-tokens-powderfinger} | |
powderfinger_lyrics <- powderfinger %>% | |
unnest_tokens(output = word, | |
input = lyric) | |
powderfinger_lyrics | |
``` | |
--- | |
class: bg-main1 | |
# What are the most common words? | |
```{r common-words} | |
powderfinger_lyrics %>% | |
count(word) %>% | |
arrange(-n) | |
``` | |
--- | |
# Stop words | |
.huge[ | |
- In computing, stop words are words which are filtered out before or after processing of natural language data (text). | |
- They usually refer to the most common words in a language, but there is not a single list of stop words used by all natural language processing tools. | |
] | |
--- | |
class: bg-main1 | |
# English stop words | |
```{r eng-stopwords} | |
get_stopwords() | |
``` | |
--- | |
class: bg-main1 | |
# Spanish stop words | |
```{r spanish-stopwords} | |
get_stopwords(language = "es") | |
``` | |
--- | |
class: bg-main1 | |
# Various lexicons | |
.huge[ | |
See `?get_stopwords` for more info. | |
] | |
```{r other-lexicons} | |
get_stopwords(source = "smart") | |
``` | |
--- | |
class: bg-main1 | |
# What are the most common words? | |
```{r repeat} | |
powderfinger_lyrics | |
``` |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment