cavedave/genreloud.rmd

## genreloud.rmd
---
title: "Loudness by Genre"
output: html_notebook
---

An analysis of music by Genre to see if loudness varies

It was believed that online streaming platform have reduced loudness. But does this have the same effect accross all genres of music?


 There are 26 genres so it is a total of 232,725 tracks.


If we are right and loudness in the music production is bad for hearing
but that music streaming services like spotify

The most popular genres in the top 40 are now
Hip-hop, Pop, Rock, Electronic are the most popular genres in the top-40

https://www.economist.com/graphic-detail/2018/02/02/popular-music-is-more-collaborative-than-ever

The data is in

https://www.kaggle.com/zaheenhamidani/ultimate-spotify-tracks-db/download
Spotify explain their api at
https://developer.spotify.com/documentation/web-api/reference/tracks/get-audio-features/

This is digital loudness with is different to acoustic loudness. This comment explains this well https://www.reddit.com/r/dataisbeautiful/comments/cl5m3a/music_has_gotten_louder_oc/evube0a/?context=3
Digital loudness measures soemthing like the difference between the average loundess and the peak loudness of the song. Songs with bigger differences junmp out more on the radio and this lead to the loudness wars.


First load some data. This is a version of the full dataset but missing lots of columns. How we construct it is below

```{r}
library(stringr)
library(tidyverse)
library(ggplot2)
library(cowplot)
library(readr)

df2=read.csv("smalldata.csv",encoding = "UTF-8" )

head(df2)
```


How these raincloud graphs are made


```{r}
source("https://gist.githubusercontent.com/benmarwick/2a1bb0133ff568cbe28d/raw/fb53bd97121f7f9ce947837ef1a4c65a73bffb3f/geom_flat_violin.R")
```
https://micahallen.org/2018/03/15/introducing-raincloud-plots/
and a paper 'Allen M, Poggiali D, Whitaker K et al. Raincloud plots: a multi-platform tool for robust data visualization'  https://wellcomeopenresearch.org/articles/4-63/v1


```{r}

p3 <- ggplot(df2,aes(x=genre,y=loudness, fill = genre))+
  geom_flat_violin(position = position_nudge(x = .2, y = 0),adjust = 2)+
  geom_point(position = position_jitter(width = .15), size = .25)+
  ylab('Loudness (dB)')+xlab('Genre')+coord_flip()+theme_cowplot()+guides(fill = FALSE)+
  geom_boxplot(width = .1, guides = FALSE, outlier.shape = NA, alpha = 0.5) +
 #  geom_boxplot(aes(x = as.numeric(genre), y = loudness),outlier.shape = NA, alpha = 0.3, width = .1, colour = "BLACK") +#+0.25
 # geom_errorbar(data = summary_loud, aes(x = genre, y = Mean, ymin = Mean-ci, ymax = Mean+ci), position = position_nudge(.25), colour = "BLACK", width = 0.1, size = 0.8)+

  ggtitle('Loudness by Genre')+
theme(plot.title = element_text(hjust = 0.5))

ggsave("GenreR.png", height=20, width=20)

p3

```

```{r}
summary_loud<-as.data.frame(tapply(df2$loudness, df2$genre, summary))
```


```{r}

summary_loud<-as.data.frame(summary(df2))

```


```{r}
summary_loud
```

```{r}
df=read.csv("spotify.csv",encoding = "UTF-8" )

head(df)
```


```{r}
names(df)[1] <- "genre"
head(df)
```


```{r}

#library(lavaan)


df<-dplyr::select(df, -c('artist_name', 'key','mode','time_signature','track_name','track_id','popularity','danceability','acousticness','duration_ms','energy','instrumentalness','liveness','speechiness','tempo','valence'))


```


```{r}
summary(df)
```


This gets the stats on each genre. Median ,mean loudness and such

```{r}
#summary<-
tapply(df$loudness, df$genre, summary)
```

$`Children's Music`
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
-36.721 -14.094 -11.286 -11.642  -8.614   0.948

$`Children’s Music`
data['genre'] = data['genre'].str.replace('’','\'')

df<-str_replace_all(df$genre, "’", "'")
summary(df)


```{r}
head(df)
```


filter out uncommon genres or ones I think are repetition

```{r}

df2 <-filter(df, grepl('Dance|Rock|Pop|Classical', genre))
head(df2)

```

```{r}
# Write CSV in R
write.csv(df2, file = "smalldata.csv", row.names=FALSE)
```

```{r}
quiet <-filter(df, loudness>1)
quiet
```
Quietest songs are
Brian Eno	Neroli and Shakuhachi Sakano	Call to Wake
Loudest are Justice	We Are Your Friends - Justice Vs Simian
The Stooges	Shake Appeal - Iggy Pop Mix
	---
	title: "Loudness by Genre"
	output: html_notebook
	---

	An analysis of music by Genre to see if loudness varies

	It was believed that online streaming platform have reduced loudness. But does this have the same effect accross all genres of music?


	There are 26 genres so it is a total of 232,725 tracks.


	If we are right and loudness in the music production is bad for hearing
	but that music streaming services like spotify

	The most popular genres in the top 40 are now
	Hip-hop, Pop, Rock, Electronic are the most popular genres in the top-40

	https://www.economist.com/graphic-detail/2018/02/02/popular-music-is-more-collaborative-than-ever

	The data is in

	https://www.kaggle.com/zaheenhamidani/ultimate-spotify-tracks-db/download
	Spotify explain their api at
	https://developer.spotify.com/documentation/web-api/reference/tracks/get-audio-features/

	This is digital loudness with is different to acoustic loudness. This comment explains this well https://www.reddit.com/r/dataisbeautiful/comments/cl5m3a/music_has_gotten_louder_oc/evube0a/?context=3
	Digital loudness measures soemthing like the difference between the average loundess and the peak loudness of the song. Songs with bigger differences junmp out more on the radio and this lead to the loudness wars.




	First load some data. This is a version of the full dataset but missing lots of columns. How we construct it is below

	```{r}
	library(stringr)
	library(tidyverse)
	library(ggplot2)
	library(cowplot)
	library(readr)

	df2=read.csv("smalldata.csv",encoding = "UTF-8" )

	head(df2)
	```


	How these raincloud graphs are made



	```{r}
	source("https://gist.githubusercontent.com/benmarwick/2a1bb0133ff568cbe28d/raw/fb53bd97121f7f9ce947837ef1a4c65a73bffb3f/geom_flat_violin.R")
	```
	https://micahallen.org/2018/03/15/introducing-raincloud-plots/
	and a paper 'Allen M, Poggiali D, Whitaker K et al. Raincloud plots: a multi-platform tool for robust data visualization' https://wellcomeopenresearch.org/articles/4-63/v1



	```{r}

	p3 <- ggplot(df2,aes(x=genre,y=loudness, fill = genre))+
	geom_flat_violin(position = position_nudge(x = .2, y = 0),adjust = 2)+
	geom_point(position = position_jitter(width = .15), size = .25)+
	ylab('Loudness (dB)')+xlab('Genre')+coord_flip()+theme_cowplot()+guides(fill = FALSE)+
	geom_boxplot(width = .1, guides = FALSE, outlier.shape = NA, alpha = 0.5) +
	# geom_boxplot(aes(x = as.numeric(genre), y = loudness),outlier.shape = NA, alpha = 0.3, width = .1, colour = "BLACK") +#+0.25
	# geom_errorbar(data = summary_loud, aes(x = genre, y = Mean, ymin = Mean-ci, ymax = Mean+ci), position = position_nudge(.25), colour = "BLACK", width = 0.1, size = 0.8)+

	ggtitle('Loudness by Genre')+
	theme(plot.title = element_text(hjust = 0.5))

	ggsave("GenreR.png", height=20, width=20)

	p3

	```

	```{r}
	summary_loud<-as.data.frame(tapply(df2$loudness, df2$genre, summary))
	```


	```{r}

	summary_loud<-as.data.frame(summary(df2))

	```


	```{r}
	summary_loud
	```

	```{r}
	df=read.csv("spotify.csv",encoding = "UTF-8" )

	head(df)
	```


	```{r}
	names(df)[1] <- "genre"
	head(df)
	```


	```{r}

	#library(lavaan)


	df<-dplyr::select(df, -c('artist_name', 'key','mode','time_signature','track_name','track_id','popularity','danceability','acousticness','duration_ms','energy','instrumentalness','liveness','speechiness','tempo','valence'))


	```


	```{r}
	summary(df)
	```


	This gets the stats on each genre. Median ,mean loudness and such

	```{r}
	#summary<-
	tapply(df$loudness, df$genre, summary)
	```

	$`Children's Music`
	Min. 1st Qu. Median Mean 3rd Qu. Max.
	-36.721 -14.094 -11.286 -11.642 -8.614 0.948

	$`Children’s Music`
	data['genre'] = data['genre'].str.replace('’','\'')

	df<-str_replace_all(df$genre, "’", "'")
	summary(df)





	```{r}
	head(df)
	```


	filter out uncommon genres or ones I think are repetition

	```{r}

	df2 <-filter(df, grepl('Dance\|Rock\|Pop\|Classical', genre))
	head(df2)

	```

	```{r}
	# Write CSV in R
	write.csv(df2, file = "smalldata.csv", row.names=FALSE)
	```

	```{r}
	quiet <-filter(df, loudness>1)
	quiet
	```
	Quietest songs are
	Brian Eno Neroli and Shakuhachi Sakano Call to Wake
	Loudest are Justice We Are Your Friends - Justice Vs Simian
	The Stooges Shake Appeal - Iggy Pop Mix