Skip to content

Instantly share code, notes, and snippets.

@njtierney
Created May 25, 2020 07:44
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save njtierney/193e6ea77254f557970d223358812ca7 to your computer and use it in GitHub Desktop.
Save njtierney/193e6ea77254f557970d223358812ca7 to your computer and use it in GitHub Desktop.
---
class: bg-main1
# Let's get more data
.huge[
We'll use the `genius` package to get song lyric data from [Genius](https://genius.com/).
- `genius_album()` allows you to download the lyrics for an entire album in a
tidy format.
]
---
class: bg-main1
# getting more data
.huge[
- Input: Two arguments: `artists` and `album`. If it gives you issues check that you have the album name and
artists as specified on [Genius](https://genius.com/).
- Output: A tidy data frame with three columns:
- `title`: track name
- `track_n`: track number
- `text`: lyrics
]
---
class: bg-main1
# [Greatest Australian Album of all time (as voted by triple J)](https://www.abc.net.au/triplej/hottest100/alltime/11/countdown/cd_1.htm)
```{r show-powderfinger-website, echo = FALSE, out.width = "90%"}
include_graphics("images/powderfinger.png")
```
---
class: bg-main1
# Greatest Australian Album of all time (as voted by triple J)
```{r powderfinger-album, cache=TRUE}
od_num_five <- genius_album(
artist = "Powderfinger",
album = "Odyssey Number Five"
)
od_num_five
```
---
class: bg-main1
# Save for later
```{r save-powderfinger}
powderfinger <- od_num_five %>%
mutate(
artist = "Powderfinger",
album = "Odyssey Number Five"
)
powderfinger
```
---
class: bg-main1
# What songs are in the album?
```{r distinct-songs}
powderfinger %>% distinct(track_title)
```
---
class: bg-main1
# How long are the lyrics in Powderfinger's songs?
```{r powderfinger-n-lines}
powderfinger %>%
count(track_title) %>%
arrange(-n)
```
---
class: bg-main1
# Tidy up the lyrics!
```{r unnest-tokens-powderfinger}
powderfinger_lyrics <- powderfinger %>%
unnest_tokens(output = word,
input = lyric)
powderfinger_lyrics
```
---
class: bg-main1
# What are the most common words?
```{r common-words}
powderfinger_lyrics %>%
count(word) %>%
arrange(-n)
```
---
# Stop words
.huge[
- In computing, stop words are words which are filtered out before or after processing of natural language data (text).
- They usually refer to the most common words in a language, but there is not a single list of stop words used by all natural language processing tools.
]
---
class: bg-main1
# English stop words
```{r eng-stopwords}
get_stopwords()
```
---
class: bg-main1
# Spanish stop words
```{r spanish-stopwords}
get_stopwords(language = "es")
```
---
class: bg-main1
# Various lexicons
.huge[
See `?get_stopwords` for more info.
]
```{r other-lexicons}
get_stopwords(source = "smart")
```
---
class: bg-main1
# What are the most common words?
```{r repeat}
powderfinger_lyrics
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment