Skip to content

Instantly share code, notes, and snippets.

@shaunagm
Last active August 29, 2015 14:11
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save shaunagm/230090b3a8bda2f84805 to your computer and use it in GitHub Desktop.
Save shaunagm/230090b3a8bda2f84805 to your computer and use it in GitHub Desktop.
{
"metadata": {
"name": "Lennon or McCartney_"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": "Let's practice data analysis in Python!"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "Someone made [a compilation](https://www.youtube.com/watch?v=kSwUzM_nTGM) of musicians and other celebrities answering the question, \"Lennon or McCartney?\" \n\nI wasn't sure how I'd answer it myself. It's not really fair to compare their post-Beatle careers, McCartney having had 30+ extra years in which to write and produce. When you look at their character, John is certainly the more divisive figure. On the one hand, he was admittedly abusive to his first wife and son, and was violent towards others as well. On the other hand, towards the end of his life he became a force for peace, and his [Rolling Stone cover with Yoko Ono](http://www.famouspictures.org/wp-content/uploads/2013/05/Lennon-Ono-Rolling-Stones-Cover.jpg) remains one of the best artistic and cultural statements I've seen. Undoubtedly McCartney is kinder than Lennon. But the question is not \"Who is the better person?\"\n\nThat leaves the (Beatles) music! Just thinking of random songs, I'm not sure who I like more or even who wrote the songs I like best. So I grabbed a list of all their songs from Wikipedia, covered the credits column, and rated them:\n\n* -1 actively dislike \n* 0 don't know this song \n* 2 meh \n* 3 like \n* 4 love \n* 5 favorite\n\nLet's get to analyzing! We start by importing the file and making sure it looks like we expect:"
},
{
"cell_type": "code",
"collapsed": false,
"input": "import pandas\nimport numpy as np\n\ndf = pandas.read_csv(\"./beatles.csv\")\nprint df[:5]",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": " Title Album Ratings Writer\n0 \"12-Bar Original\" Anthology 2 0 Other\n1 \"Across the Universe\" Let It Be 3 Lennon\n2 \"Act Naturally\" Help! 2 NonBeatle\n3 \"Ain't She Sweet\" Anthology 1 2 NonBeatle\n4 \"All I've Got to Do\" With the Beatles 0 Lennon\n"
}
],
"prompt_number": 69
},
{
"cell_type": "markdown",
"metadata": {},
"source": "Let's start by comparing #s. Who has written the most songs?"
},
{
"cell_type": "code",
"collapsed": false,
"input": "df.groupby(['Writer'])['Ratings'].count()",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "pyout",
"prompt_number": 70,
"text": "Writer\n NonBeatle 76\nHarrison 27\nLennon 57\nLennon and McCartney 16\nLennon with McCartney 27\nMcCartney 64\nMcCartney with Lennon 27\nOther 15\nName: Ratings, dtype: int64"
}
],
"prompt_number": 70
},
{
"cell_type": "markdown",
"metadata": {},
"source": "We can see a few things here. One, Lennon and McCartney wrote far more songs on their own than they did together - nearly twice as many. McCartney is slightly more prolific than Lennon, but not by much. The single most common writing category though is \"other\".\n\nNext, I'm curious to see who wrote the most songs that I didn't even recognize."
},
{
"cell_type": "code",
"collapsed": false,
"input": "mysterySongs = df[df['Ratings'] == 0]\nmysterySongs.groupby(['Writer'])['Ratings'].count() ",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "pyout",
"prompt_number": 71,
"text": "Writer\n NonBeatle 64\nHarrison 11\nLennon 21\nLennon and McCartney 7\nLennon with McCartney 7\nMcCartney 24\nMcCartney with Lennon 3\nOther 9\nName: Ratings, dtype: int64"
}
],
"prompt_number": 71
},
{
"cell_type": "markdown",
"metadata": {},
"source": "A clearer way to look at this might be to ask who had the highest percentages of songs I didn't recognize."
},
{
"cell_type": "code",
"collapsed": false,
"input": "mysterySongs.groupby(['Writer'])['Ratings'].count() / df.groupby(['Writer'])['Ratings'].count()",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "pyout",
"prompt_number": 72,
"text": "Writer\n NonBeatle 0.842105\nHarrison 0.407407\nLennon 0.368421\nLennon and McCartney 0.437500\nLennon with McCartney 0.259259\nMcCartney 0.375000\nMcCartney with Lennon 0.111111\nOther 0.600000\nName: Ratings, dtype: float64"
}
],
"prompt_number": 72
},
{
"cell_type": "markdown",
"metadata": {},
"source": "Not surprised that I'm disproportionately less likely to know Beatles song written by non-Beatles. I am surprised to realize I don't know 40% of George Harrison's songs. Sorry, George! The percentage of unknown songs by \"Lennon and McCartney\" is surprisingly high as well, although that's maybe skewed by the small number of songs overall credited to them both equally.\n\nOkay, let's move on to the real question - whose songs do I like more?"
},
{
"cell_type": "code",
"collapsed": false,
"input": "songs = df[df['Ratings'] != 0]\n\nsongs.groupby(['Writer'])['Ratings'].agg([np.mean, np.std])",
"language": "python",
"metadata": {},
"outputs": [
{
"html": "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>mean</th>\n <th>std</th>\n </tr>\n <tr>\n <th>Writer</th>\n <th></th>\n <th></th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th> NonBeatle</th>\n <td> 2.833333</td>\n <td> 0.717741</td>\n </tr>\n <tr>\n <th>Harrison</th>\n <td> 3.250000</td>\n <td> 0.856349</td>\n </tr>\n <tr>\n <th>Lennon</th>\n <td> 3.055556</td>\n <td> 0.629941</td>\n </tr>\n <tr>\n <th>Lennon and McCartney</th>\n <td> 3.222222</td>\n <td> 0.666667</td>\n </tr>\n <tr>\n <th>Lennon with McCartney</th>\n <td> 3.150000</td>\n <td> 1.136708</td>\n </tr>\n <tr>\n <th>McCartney</th>\n <td> 3.625000</td>\n <td> 0.774183</td>\n </tr>\n <tr>\n <th>McCartney with Lennon</th>\n <td> 3.000000</td>\n <td> 0.589768</td>\n </tr>\n <tr>\n <th>Other</th>\n <td> 2.666667</td>\n <td> 0.516398</td>\n </tr>\n </tbody>\n</table>\n</div>",
"output_type": "pyout",
"prompt_number": 73,
"text": " mean std\nWriter \n NonBeatle 2.833333 0.717741\nHarrison 3.250000 0.856349\nLennon 3.055556 0.629941\nLennon and McCartney 3.222222 0.666667\nLennon with McCartney 3.150000 1.136708\nMcCartney 3.625000 0.774183\nMcCartney with Lennon 3.000000 0.589768\nOther 2.666667 0.516398"
}
],
"prompt_number": 73
},
{
"cell_type": "markdown",
"metadata": {},
"source": "If we're looking just at means, my favorite writer is McCartney alone, and my least favorite writer (other than non-Beatles and weird combos) is McCartney with Lennon. I'd point out that that doesn't make much sense, but the standard deviations show that they're all within a reasonable range of each other. Another way to view this is to look at all songs which Lennon has credits on vs all songs which McCartney has credits on."
},
{
"cell_type": "code",
"collapsed": false,
"input": "songs['IsLennon'] = np.where(songs['Writer'].str.contains(\"Lennon\"), 1, 0)\nsongs['IsMcCartney'] = np.where(songs['Writer'].str.contains(\"McCartney\"), 1, 0)\n\nprint songs[:5]",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": " Title Album Ratings Writer IsLennon \\\n1 \"Across the Universe\" Let It Be 3 Lennon 1 \n2 \"Act Naturally\" Help! 2 NonBeatle 0 \n3 \"Ain't She Sweet\" Anthology 1 2 NonBeatle 0 \n5 \"All My Loving\" With the Beatles 3 McCartney 0 \n6 \"All Things Must Pass\" Anthology 3 4 Harrison 0 \n\n IsMcCartney \n1 0 \n2 0 \n3 0 \n5 1 \n6 0 \n"
}
],
"prompt_number": 74
},
{
"cell_type": "markdown",
"metadata": {},
"source": "Now let's compare!"
},
{
"cell_type": "code",
"collapsed": false,
"input": "print \"Lennon:\" \nprint songs[songs['IsLennon'] == 1]['Ratings'].mean(), songs[songs['IsLennon'] == 1]['Ratings'].std()\n\nprint \"McCartney:\" \nprint songs[songs['IsMcCartney'] == 1]['Ratings'].mean(), songs[songs['IsMcCartney'] == 1]['Ratings'].std()",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "Lennon:\n3.07865168539 0.757158550424\nMcCartney:\n3.32258064516 0.849056897804\n"
}
],
"prompt_number": 75
},
{
"cell_type": "markdown",
"metadata": {},
"source": "Yup, still like McCartney more. I have a few more questions. I'm curious who wrote my favorite and least favorite songs. Let's take a look:"
},
{
"cell_type": "code",
"collapsed": false,
"input": "bestSongs = df[df['Ratings'] == 5]\nprint bestSongs",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": " Title Album Ratings Writer\n63 \"Eleanor Rigby\" Revolver 5 McCartney\n72 \"For No One\" Revolver 5 McCartney\n97 \"Here Comes the Sun\" Abbey Road 5 Harrison\n98 \"Here, There and Everywhere\" Revolver 5 McCartney\n228 \"I Will\" The Beatles 5 McCartney\n"
}
],
"prompt_number": 76
},
{
"cell_type": "markdown",
"metadata": {},
"source": "I've always liked the melodic, melancholic stuff the best. (Also, I've always liked Revolver! Maybe I'll tack on a by-album analysis at the end.) Now, how about my least favorite songs? Turns out there's a bunch I labelled \"meh\", so here are some counts:"
},
{
"cell_type": "code",
"collapsed": false,
"input": "worstSongs = songs[songs['Ratings'] < 3].groupby(['Writer'])['Ratings'].count()\nprint worstSongs",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "Writer\n NonBeatle 4\nHarrison 3\nLennon 6\nLennon and McCartney 1\nLennon with McCartney 2\nMcCartney 3\nMcCartney with Lennon 4\nOther 2\nName: Ratings, dtype: int64\n"
}
],
"prompt_number": 77
},
{
"cell_type": "markdown",
"metadata": {},
"source": "Guess I do not < 3 Lennon by himself. (Get it? :P)\n\nI distinctly remember creating the rating -1 for the only Beatles song I actually hate. Who wrote that one?"
},
{
"cell_type": "code",
"collapsed": false,
"input": "worstSongs = df[df['Ratings'] == -1]\nprint worstSongs",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": " Title Album Ratings Writer\n217 \"Run for Your Life\" Rubber Soul -1 Lennon with McCartney\n"
}
],
"prompt_number": 78
},
{
"cell_type": "markdown",
"metadata": {},
"source": "From Wikipedia: \"The song's lyrics establish a threatening tone towards the singer's unnamed girlfriend (referred to throughout the song as \"little girl\"), claiming \"I'd rather see you dead, little girl, than to be with another man.\" ... Lennon designated this song as his \"least favourite Beatles song\" in a 1973 interview and later said it was the song he most regretted writing.\"\n\nIn conclusion, I appear to like McCartney slightly more than Lennon. \n\nOne more analysis! What are my favorite albums (in order?) I'm leaving the unknown songs in here because I think there a sign of me not liking the album as much. \n\nLet's also only select the 'studio albums'."
},
{
"cell_type": "code",
"collapsed": false,
"input": "studioAlbums = ['Please Please Me', 'With the Beatles', 'A Hard Day\\'s Night', 'Beatles for Sale', 'Help!', 'Rubber Soul', 'Revolver', 'Sgt. Pepper\\'s Lonely Hearts Club Band', 'Magical Mystery Tour', 'The Beatles', 'Yellow Submarine', 'Let It Be', 'Abbey Road']\nalbums = df[df['Album'].isin(studioAlbums) == True]\nalbums.groupby(['Album'])['Ratings'].agg([np.mean, np.std])",
"language": "python",
"metadata": {},
"outputs": [
{
"html": "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>mean</th>\n <th>std</th>\n </tr>\n <tr>\n <th>Album</th>\n <th></th>\n <th></th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>A Hard Day's Night</th>\n <td> 2.615385</td>\n <td> 1.556624</td>\n </tr>\n <tr>\n <th>Abbey Road</th>\n <td> 3.352941</td>\n <td> 0.785905</td>\n </tr>\n <tr>\n <th>Beatles for Sale</th>\n <td> 1.428571</td>\n <td> 1.741542</td>\n </tr>\n <tr>\n <th>Help!</th>\n <td> 3.214286</td>\n <td> 0.892582</td>\n </tr>\n <tr>\n <th>Let It Be</th>\n <td> 2.083333</td>\n <td> 1.880925</td>\n </tr>\n <tr>\n <th>Magical Mystery Tour</th>\n <td> 2.818182</td>\n <td> 1.078720</td>\n </tr>\n <tr>\n <th>Please Please Me</th>\n <td> 1.285714</td>\n <td> 1.589803</td>\n </tr>\n <tr>\n <th>Revolver</th>\n <td> 3.428571</td>\n <td> 1.342460</td>\n </tr>\n <tr>\n <th>Rubber Soul</th>\n <td> 2.785714</td>\n <td> 1.251373</td>\n </tr>\n <tr>\n <th>Sgt. Pepper's Lonely Hearts Club Band</th>\n <td> 2.615385</td>\n <td> 1.260850</td>\n </tr>\n <tr>\n <th>The Beatles</th>\n <td> 2.066667</td>\n <td> 1.638614</td>\n </tr>\n <tr>\n <th>With the Beatles</th>\n <td> 1.500000</td>\n <td> 1.605280</td>\n </tr>\n <tr>\n <th>Yellow Submarine</th>\n <td> 2.000000</td>\n <td> 1.414214</td>\n </tr>\n </tbody>\n</table>\n</div>",
"output_type": "pyout",
"prompt_number": 79,
"text": " mean std\nAlbum \nA Hard Day's Night 2.615385 1.556624\nAbbey Road 3.352941 0.785905\nBeatles for Sale 1.428571 1.741542\nHelp! 3.214286 0.892582\nLet It Be 2.083333 1.880925\nMagical Mystery Tour 2.818182 1.078720\nPlease Please Me 1.285714 1.589803\nRevolver 3.428571 1.342460\nRubber Soul 2.785714 1.251373\nSgt. Pepper's Lonely Hearts Club Band 2.615385 1.260850\nThe Beatles 2.066667 1.638614\nWith the Beatles 1.500000 1.605280\nYellow Submarine 2.000000 1.414214"
}
],
"prompt_number": 79
},
{
"cell_type": "markdown",
"metadata": {},
"source": "Looks like my favorite is in fact Revolver, followed closely by Abbey Road. Not surpised to see the early stuff (Please Please Me, Beatles for Sale, With the Beatles) farther down. "
}
],
"metadata": {}
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment