Skip to content

Instantly share code, notes, and snippets.

@nealcaren
Last active January 30, 2016 19:38
Show Gist options
  • Save nealcaren/1543de86fbf1fc4a7060 to your computer and use it in GitHub Desktop.
Save nealcaren/1543de86fbf1fc4a7060 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Beauty and Beast Speech Analysis\n",
"\n",
"In their recent paper, \"A quantitative analysis of gendered compliments in Disney Princess films,\" Carmen Fought and Karen Eisenhauer found that female characters in the classic Disney princess movies had more dialogue than female characters in more recent Disney films. I found the script to Beauty and the Beast online, so I quickly replicated part of their analysis for that movie in Python. \n",
"\n",
"As a bonus, I include an analysis of Toy Story at the end. The script was in a different format and I used more [pandas](http://pandas.pydata.org). 91% male speakers in that film.\n",
"\n",
"\n",
"**Neal Caren** - University of North Carolina, Chapel Hill\n",
"[mail](mailto:neal.caren@unc.edu),\n",
"[web](http://nealcaren.web.unc.edu)\n",
"[twitter](http://twitter.com/HaphazardSoc)\n",
"[scholar](http://scholar.google.com/citations?user=cy0u16kAAAAJ&hl=en)"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"from __future__ import division\n",
"\n",
"import re\n",
"from collections import defaultdict\n",
"\n",
"import requests\n",
"import pandas as pd\n",
"import matplotlib\n",
"\n",
"%matplotlib inline\n",
"matplotlib.style.use('ggplot')\n"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# Load the script which comes as a text file\n",
"\n",
"script_url = 'http://www.fpx.de/fp/Disney/Scripts/BeautyAndTheBeast.txt'\n",
"script = requests.get(script_url).text"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"[u'<pre>',\n",
" u'Beauty and the Beast',\n",
" u'The Complete Script',\n",
" u'',\n",
" u'Compiled by Ben Scripps <34rqnpq@cmuvm.csv.cmich.edu>',\n",
" u'',\n",
" u'NARRATOR: Once upon a time, in a faraway land, a young prince lived in a',\n",
" u' shining castle. Although he had everything his heart desired,',\n",
" u' the prince was spoiled, selfish, and unkind. But then, one',\n",
" u\" winter's night, an old beggar woman came to the castle and\",\n",
" u' offered him a single rose in return for shelter from the bitter',\n",
" u' cold. Repulsed by her haggard appearance, the prince sneered at',\n",
" u' the gift and turned the old woman away, but she warned him not',\n",
" u' to be deceived by appearances, for beauty is found within. ',\n",
" u\" And when he dismissed her again, the old woman's ugliness\",\n",
" u' melted away to reveal a beautiful enchantress. The prince',\n",
" u' tried to apologize, but it was too late, for she had seen that',\n",
" u' there was no love in his heart, and as punishment, she',\n",
" u' transformed him into a hideous beast, and placed a',\n",
" u' powerful spell on the castle, and all who lived there.']"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Let's look at the beginning of the script\n",
"\n",
"script.splitlines()[:20]"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"[u'MAURICE: How did you find me?',\n",
" u'BELLE: Oh, your hands are like ice. We have to get you out of here.',\n",
" u'MAURICE: Belle, I want you to leave this place.',\n",
" u\"BELLE: Who's done this to you?\",\n",
" u'MAURICE: No time to explain. You must go...now!',\n",
" u\"BELLE: I won't leave you!\",\n",
" u\"(Suddenly, BEAST grabs BELLE's shoulder and whips her around. She drops the\",\n",
" u'torch she was carrying into a puddle and the room is dark except for one beam',\n",
" u'of',\n",
" u'light from a skylight.)',\n",
" u'BEAST: What are you doing here?',\n",
" u'MAURICE: Run, Belle!',\n",
" u\"BELLE: Who's there? Who are you?\",\n",
" u'BEAST: The master of this castle.',\n",
" u\"BELLE: I've come for my father. Please let him out! Can't you see he's\",\n",
" u' sick?',\n",
" u\"BEAST: Then he shouldn't have trespassed here.\",\n",
" u\"BELLE: But he could die. Please, I'll do anything!\",\n",
" u\"BEAST: There's nothing you can do. He's my prisoner.\",\n",
" u'BELLE: Oh, there must be some way I can...wait! Take me, instead!']"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Let's look at a random place\n",
"\n",
"script.splitlines()[500:520]"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# seems fairly easy to parse since \n",
"# each new speaking line has : and begins with all caps\n",
"\n",
"def remove_spaces(line):\n",
" # remove the weird spaces\n",
" return re.sub(' +',' ',line)\n",
"\n",
"def remove_paren(line):\n",
" # remove directions that are not spoken\n",
" return re.sub(r'\\([^)]*\\)', '', line)\n",
"\n",
"\n",
"lines = []\n",
"line = ''\n",
"for row in script.splitlines():\n",
" if ': ' in row and row[:3].upper() == row[:3]:\n",
" line = remove_spaces(line)\n",
" line = remove_paren(line)\n",
" lines.append(line)\n",
" line = row\n",
" elif ' ' in row:\n",
" line = line + ' ' + row.lstrip()\n",
"# don't forget the last line\n",
"lines.append(remove_spaces(line))"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"['',\n",
" u\"NARRATOR: Once upon a time, in a faraway land, a young prince lived in a shining castle. Although he had everything his heart desired, the prince was spoiled, selfish, and unkind. But then, one winter's night, an old beggar woman came to the castle and offered him a single rose in return for shelter from the bitter cold. Repulsed by her haggard appearance, the prince sneered at the gift and turned the old woman away, but she warned him not to be deceived by appearances, for beauty is found within. And when he dismissed her again, the old woman's ugliness melted away to reveal a beautiful enchantress. The prince tried to apologize, but it was too late, for she had seen that there was no love in his heart, and as punishment, she transformed him into a hideous beast, and placed a powerful spell on the castle, and all who lived there. Ashamed of his monstrous form, the beast concealed himself inside his castle, with a magic mirror as his only window to the outside world. The rose she had offered was truly an enchanted rose, which would bloom until his twenty-first year. If he could learn to love another, and earn her love in return by the time the last petal fell, then the spell would be broken. If not, he would be doomed to remain a beast for all time. As the years passed, he fell into despair, and lost all hope, for who could ever learn to love a beast?\",\n",
" u\"BELLE: Little town, it's a quiet village Every day, like the one before Little town, full of little people Waking up to say...\",\n",
" u'TOWNSFOLK 1: Bonjour!',\n",
" u'TOWNSFOLK 2: Bonjour!',\n",
" u'TOWNSFOLK 3: Bonjour! ',\n",
" u'TOWNSFOLK 4: Bonjour!',\n",
" u'TOWNSFOLK 5: Bonjour!',\n",
" u\"BELLE: There goes the baker with his tray like always The same old bread and rolls to sell Ev'ry morning just the same Since the morning that we came To this poor provincial town...\",\n",
" u'BAKER: Good morning, Belle!',\n",
" u'BELLE: Morning monsieur!',\n",
" u'BAKER: Where are you off to?',\n",
" u'BELLE: The bookshop! I just finished the most wonderful story, about a beanstalk and an ogre and...',\n",
" u\"BAKER: That's nice...Marie, the baguettes! Hurry up!!\",\n",
" u\"TOWNSFOLK: Look there she goes, that girl is strange no question Dazed and distracted, can't you tell?\"]"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"lines[:15]"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"[u'LUMIERE: En garde, you overgrown pocket watch! ',\n",
" u'CHIP: Are they gonna live happily ever after, mama?',\n",
" u'MRS. POTTS: Of course, my dear. Of course.',\n",
" u'CHIP: Do I still have to sleep in the cupboard?',\n",
" u'CHORUS: Certain as the sun Rising in the east Tale as old as time, song as old as rhyme Beauty and the beast! Tale as old as time, song as old as rhyme Beauty and the beast!']"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# How does the end look\n",
"\n",
"lines[-5:]"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"690\n",
"689\n"
]
}
],
"source": [
"# remove blank lines, if any\n",
"\n",
"print len(lines)\n",
"lines = [l for l in lines if len(l) > 0]\n",
"print len(lines)"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# now figure out the roles and how many times they appear\n",
"\n",
"roles = defaultdict(int)\n",
"\n",
"for line in lines:\n",
" # take advantage of the fact that the speaker is always listed before the :\n",
" speaker = line.split(':')[0]\n",
" roles[speaker] = roles[speaker] + 1"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"59"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(roles)"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"defaultdict(int,\n",
" {u' to think about': 1,\n",
" u'ALL': 14,\n",
" u'BAKER': 3,\n",
" u'BARBER': 1,\n",
" u'BEAST': 79,\n",
" u'BELLE': 137,\n",
" u'BIMBETTE 1': 1,\n",
" u'BIMBETTE 2': 1,\n",
" u'BIMBETTE 3': 1,\n",
" u'BIMBETTES': 2,\n",
" u'BOOKSELLER': 6,\n",
" u'BOTH': 3,\n",
" u'BYSTANDERS': 1,\n",
" u'CHIP': 24,\n",
" u'CHORUS': 1,\n",
" u'COGSWORTH': 60,\n",
" u'CRONY 1': 2,\n",
" u'CRONY 2': 1,\n",
" u'CRONY 3': 1,\n",
" u\"D'ARQUE\": 5,\n",
" u'DRIVER': 2,\n",
" u'GASTON': 66,\n",
" u'GROUP 1': 1,\n",
" u'GROUP 2': 1,\n",
" u'LEFOU': 35,\n",
" u'LUMIERE': 67,\n",
" u'MAN 1': 3,\n",
" u'MAN 2': 2,\n",
" u'MAN 3': 2,\n",
" u'MAN 4': 3,\n",
" u'MAN 5': 2,\n",
" u'MAN 6': 1,\n",
" u'MAURICE': 66,\n",
" u'MEN': 2,\n",
" u'MERCHANT': 2,\n",
" u'MOB': 7,\n",
" u'MRS. POTTS': 43,\n",
" u'MUGS': 1,\n",
" u'NARRATOR': 1,\n",
" u'OBJECTS': 1,\n",
" u'OLD CRONIES': 4,\n",
" u'OLD MAN': 1,\n",
" u'PIERRE': 1,\n",
" u'PRINCE': 2,\n",
" u'STOVE': 1,\n",
" u'TOWNSFOLK': 2,\n",
" u'TOWNSFOLK 1': 1,\n",
" u'TOWNSFOLK 2': 1,\n",
" u'TOWNSFOLK 3': 1,\n",
" u'TOWNSFOLK 4': 1,\n",
" u'TOWNSFOLK 5': 1,\n",
" u'WARDROBE': 6,\n",
" u'WOMAN 1': 4,\n",
" u'WOMAN 2': 2,\n",
" u'WOMAN 3': 3,\n",
" u'WOMAN 4': 3,\n",
" u'WOMAN 5': 1,\n",
" u'WOMEN': 1,\n",
" u'WRESTLER': 1})"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# take a look at the relative frequency of each role\n",
"roles"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# Looks like there is one bum line ('to think about'')\n",
"# But I'll ignore that for now.\n",
"\n",
"# Quickly eye ball which roles are female and which are possibly mixed groups.\n",
"\n",
"females = ['WOMAN 1',\n",
" 'WOMAN 2',\n",
" 'WOMAN 3',\n",
" 'WOMAN 4',\n",
" 'WOMAN 5',\n",
" 'OLD CRONIES',\n",
" 'MRS. POTTS',\n",
" 'BELLE',\n",
" 'BIMBETTE 1'\n",
" 'BIMBETTE 2',\n",
" 'BIMBETTE 3']\n",
"\n",
"groups = ['MOB',\n",
" 'ALL',\n",
" 'BOTH']"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'Male': 491, 'Female': 198}\n",
"0.712626995646\n"
]
}
],
"source": [
"# Mark each line of dialogue by sex and count them\n",
"\n",
"sex_lines = {'Male': 0,\n",
" 'Female': 0}\n",
"\n",
"for line in lines:\n",
" # Extract speaker \n",
" speaker = line.split(':')[0]\n",
" \n",
" if speaker in females:\n",
" sex_lines['Female'] += 1\n",
" \n",
" elif sex_lines not in groups:\n",
" sex_lines['Male'] += 1\n",
"\n",
"print sex_lines\n",
"print sex_lines['Male']/(sex_lines['Male'] + sex_lines['Female'])"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.axes._subplots.AxesSubplot at 0x106af7490>"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXgAAAEACAYAAAC57G0KAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAFUlJREFUeJzt3W9sU+ehx/GfYyOYwYtxibU4gSI1y6J6UdrErCSUJYyp\nlVDUONB6f7hVsxE2rRNaPU3rWMeyCTYkVjCkTOxFW22dVG1BWrx22ouK1VkrSLukgLY6YrrZ1aom\nLEtiEzehS0MS3xcMF5oEOxBjePh+3iS2j895DjJfDo+PfSzJZDIpAIBx8nI9AABAdhB4ADAUgQcA\nQxF4ADAUgQcAQxF4ADCULZOFvvnNb8put8tischqtWrv3r0aGxvTwYMHNTQ0JLfbrWAwKLvdLklq\nb29XJBKR1WpVU1OTKioqsroTQLZFo1F5vd5cDwOYl4wCb7FY1NLSomXLlqXuC4fDKi8vV0NDg8Lh\nsNrb27V161b19fWps7NToVBIsVhMu3fvVmtrqywWS9Z2Asg2Ao9bUUZTNMlkUh/9PFR3d7dqa2sl\nSXV1derq6krdX1NTI6vVKrfbrcLCQvX29i7wsAEA6WR8BL9nzx7l5eXp85//vDZu3KhEIiGn0ylJ\ncjqdSiQSkqR4PK7S0tLUc10ul+LxeBaGDgC4mowCv3v3bi1fvlzvvfee9uzZI4/HM2OZ+U7BRKNR\nRaPR1O1AIDCv5wM3Eq9P3Mza2tpSv3u93tR0YkaBX758uSTp4x//uNasWaPe3l45nU6NjIykfubn\n50u6eMQ+PDycem4sFpPL5ZqxzssHccnZs2fnuVvAjeFwODQ6OprrYQAzeDyeOQ9A0s7Bf/DBBxof\nH5ckjY+P669//atWrVqlqqoqdXR0SJI6Ojrk8/kkST6fTydOnNDk5KQGBwc1MDCgkpKSBdoVAECm\n0h7BJxIJ/exnP5PFYtHU1JTWr1+viooK3XXXXQqFQopEIiooKFAwGJQkFRcXq7q6WsFgUDabTc3N\nzZxBAwA5YLmZvi6YKZqF09+/WGfPWnM9DGPceWee3O6xXA8DmGG290QvyWgOHrees2et8vuduR6G\nMV5+eVRud65HAcwPgQewYJYtW8aUbJYkk0mNjc3vf5EEHsCCsVgsnG2UJQ6HY97P4cvGAMBQBB4A\nDEXgAcBQBB4ADEXgASADfX19Ki4u1vT0dK6HkjHOogGQVdn+0J3HM6Wiog/SLnffffdpaGhIb731\nVur7tSTpgQceUE9Pj958800VFRVddR232imgBB5AVmX7Q3fh8IjSdFnSxTivXLlSv//979XU1CRJ\nOnPmjMbHx2+5cGeKKRoAt40tW7bo6NGjqdtHjx7VI488krr9pz/9SQ8++KDKysr0mc98RgcOHJhz\nXaOjo/rOd76jyspK+Xw+7du3b8aFkXKNwAO4bVRWVmpsbEy9vb2anp7WSy+9pM2bN6fCvHTpUrW2\nturMmTN64YUX9Otf/1qvvPLKrOt64okntGjRIp04cUKvvPKKXnvtNb344os3cnfSIvAAbiuXjuJf\ne+01ffKTn9QnPvGJ1GNr167Vpz71KUlSWVmZHnroIXV2ds5Yx9DQkCKRiH70ox9pyZIlcrlc2r59\nu8Lh8A3bj0wwBw/gtrJlyxZt3rxZ7777rh5++OErHjt58qT27t2rv//977pw4YImJiZUX18/Yx39\n/f26cOGCKisrJX143ep0b9LeaAQewG2lqKhIK1euVCQS0f79+yV9eHbMjh079NWvflUvvviiFi1a\npJaWFp07d27GOjwejxYvXqy33377pn6DlikaALedAwcOqK2tTR/72MckKTUHf/78eeXn52vRokU6\nderUjCmXS8u53W7V1taqpaVFY2NjSiaTeuedd/TGG2/c2B1JgyN4AFnl8UwpHB7J6vozcfmR9qpV\nq2Z97Kc//al+/OMf6wc/+IHWrl2rhx56SIlEYtZ1HDp0SD/5yU9UV1en999/X6tWrdLjjz9+Pbuy\n4Liik6G6uuxc8GMBvfzyqCor+RrcdLg4efbM9Wd7tSs6MUUDAIYi8ABgKAIPAIYi8ABgKAIPAIYi\n8ABgKAIPAIYi8ABgKAIPADfAgQMHtGPHjhu6Tb6qAEBW9b/fr7Pns/cpdc9Sj4rs6b/F8b777tPw\n8LBsNpuSyaQsFotef/11ud3urI3to270F5MReABZdfb8WfnD/qytP+wPZxR4i8WiF154QevWrcva\nWG42TNEAuG3M9tVbb731lhoaGnT33XfrgQceuOICHw8//LD27dunhoYGlZaW6itf+YrOnTunHTt2\nqKysTPX19erv708t/8Mf/lBr1qxRWVmZNm3apL/85S9zjuVq210oBB7AbWtgYECPPfaYgsGgenp6\ntGvXLm3fvl3xeDy1zEsvvaTDhw/r5MmT+uc//6mGhgZ98YtfVE9Pj+66664rrtt677336tixY+rp\n6ZHf79fXv/51TUxMzNjuv/71r7TbXQgEHsBtY9u2bfJ6vfJ6vWpubtbvfvc7bdy4UXV1dZKk9evX\nq6KiQq+++mrqOV/4whe0cuVKLVu2TBs2bNCdd96pdevWKS8vT/X19Xr77bdTyzY2Nio/P195eXn6\n2te+pomJCf3jH/+YMY729va0210IzMEDuG08//zzV8zBf//739cf/vAHHTt2TNLFKZzJyUndf//9\nqWUKCgpSvy9ZsmTG7fPnz6du/+IXv9BvfvMbDQ4OSpLGxsZmPSrv6+ubdbsL/f4AgQdw2/joHLzH\n49GWLVu0b9++6173m2++qSNHjujo0aMqLS2VJHm93lnn/Rdyu1fDFA2A29bmzZt17Ngx/fnPf9b0\n9LTGx8fV2dmpgYGBea/r/PnzstlsWr58uSYmJhQKhTQ2Npb17V4NR/AAssqz1KOwP5x+wetYfyZm\nOwfd4/Ho+eef1549e/T444/LZrPpnnvu0d69e+d8zlzq6upUV1en9evXa+nSpdq+ffucV1tKt92F\nkvEl+6anp7Vz5065XC49+eSTGhsb08GDBzU0NCS3261gMCi73S7p4hsIkUhEVqtVTU1NqqioyGgw\nXLJv4XDJvoXFJfsywyX7sierl+z74x//qKKiDz9MEA6HVV5erkOHDsnr9aq9vV3SxTcPOjs7FQqF\ntHPnTj377LOzzkEBALIro8DHYjGdOnVKGzduTN3X3d2t2tpaSRf/a9LV1ZW6v6amRlarVW63W4WF\nhert7c3C0AEAV5NR4H/1q1/p0UcfvWI+KpFIyOm8OAXgdDqVSCQkSfF4XCtWrEgt53K5FvzkfQBA\nemnfZD158qTy8/O1evVqRaPROZeb75foRKPRK9YXCATkcDjmtQ7MzWrN9QjMYrHk8frMgJUXXtZY\nrdY5X4NtbW2p3y99kEvKIPBnzpxRd3e3Tp06pYmJCf3nP//RM888I6fTqZGRkdTP/Px8SReP2IeH\nh1PPj8VicrlcM9Z7+SAu4c2ZhTM1Zc/1EIySTE7z+swA/whmz9TU1KyvQYfDoUAgMOtz0k7RfPnL\nX9aRI0d0+PBhPfHEE/r0pz+tHTt2qKqqSh0dHZKkjo4O+Xw+SZLP59OJEyc0OTmpwcFBDQwMqKSk\n5Dp2CwBwLa75PHi/369QKKRIJKKCggIFg0FJUnFxsaqrqxUMBmWz2dTc3HzDvwMZQG4kk0mO4rPk\nWs5GzPg8+BuB8+AXDufBLyzOg8fNakHOgwcA3FoIPAAYisADgKEIPAAYisADgKEIPAAYisADgKEI\nPAAYisADgKEIPAAYisADgKEIPAAYisADgKEIPAAYisADgKEIPAAYisADgKEIPAAYisADgKEIPAAY\nisADgKEIPAAYisADgKEIPAAYisADgKEIPAAYisADgKEIPAAYisADgKEIPAAYisADgKEIPAAYisAD\ngKEIPAAYisADgKFs6Ra4cOGCWlpaNDk5qampKa1du1aPPPKIxsbGdPDgQQ0NDcntdisYDMput0uS\n2tvbFYlEZLVa1dTUpIqKiqzvCADgSmkDv2jRIrW0tGjx4sWanp7Wrl27dO+99+qNN95QeXm5Ghoa\nFA6H1d7erq1bt6qvr0+dnZ0KhUKKxWLavXu3WltbZbFYbsT+AAD+K6MpmsWLF0u6eDQ/NTUlSeru\n7lZtba0kqa6uTl1dXan7a2pqZLVa5Xa7VVhYqN7e3myMHQBwFWmP4CVpenpa3/ve9/Tvf/9bDz74\noEpKSpRIJOR0OiVJTqdTiURCkhSPx1VaWpp6rsvlUjwez8LQAQBXk1Hg8/LytG/fPr3//vt6+umn\n9e67785YhikYALi5ZBT4S+x2u+6++26dPn1aTqdTIyMjqZ/5+fmSLh6xDw8Pp54Ti8XkcrlmrCsa\njSoajaZuBwIBORyOa90PfITVmusRmMViyeP1iZtWW1tb6nev1yuv1yspg8C/9957stlsstvtmpiY\n0N/+9jc1NDSoqqpKHR0d8vv96ujokM/nkyT5fD61traqvr5e8XhcAwMDKikpmbHeywdxyejo6HXt\nJD40NWXP9RCMkkxO8/rETcnhcCgQCMz6WNrAj4yM6Oc//7mmp6eVTCZVU1OjyspKlZaWKhQKKRKJ\nqKCgQMFgUJJUXFys6upqBYNB2Ww2NTc3M30DADlgSSaTyVwP4pKzZ8/megjG6Oqyy+935noYxnj5\n5VFVVnIEj5uPx+OZ8zE+yQoAhiLwAGAoAg8AhiLwAGAoAg8AhiLwAGAoAg8AhiLwAGAoAg8AhiLw\nAGAoAg8AhiLwAGAoAg8AhiLwAGAoAg8AhiLwAGAoAg8AhiLwAGAoAg8AhiLwAGAoAg8AhiLwAGAo\nAg8AhiLwAGAoAg8AhiLwAGAoAg8AhiLwAGAoAg8AhiLwAGAoAg8AhiLwAGAoAg8AhiLwAGAoAg8A\nhiLwAGAoW7oFYrGYDh8+rEQiIYvFoo0bN2rTpk0aGxvTwYMHNTQ0JLfbrWAwKLvdLklqb29XJBKR\n1WpVU1OTKioqsr4jAIArpQ281WrVY489ptWrV2t8fFxPPvmkKioqFIlEVF5eroaGBoXDYbW3t2vr\n1q3q6+tTZ2enQqGQYrGYdu/erdbWVlkslhuxPwCA/0o7ReN0OrV69WpJ0pIlS1RUVKRYLKbu7m7V\n1tZKkurq6tTV1SVJ6u7uVk1NjaxWq9xutwoLC9Xb25u9PQAAzGpec/CDg4N65513VFpaqkQiIafT\nKeniPwKJREKSFI/HtWLFitRzXC6X4vH4Ag4ZAJCJtFM0l4yPj+vAgQNqamrSkiVLZjw+3ymYaDSq\naDSauh0IBORwOOa1DszNas31CMxiseTx+sRNq62tLfW71+uV1+uVlGHgp6amtH//fn32s5/VmjVr\nJF08ah8ZGUn9zM/Pl3TxiH14eDj13FgsJpfLNWOdlw/iktHR0XnuFuYyNWXP9RCMknfH/+rV/3sn\n18MwgmepR0X2olwPwxgOh0OBQGDWxzIK/JEjR1RcXKxNmzal7quqqlJHR4f8fr86Ojrk8/kkST6f\nT62traqvr1c8HtfAwIBKSkoWYDeA3IlN9unRsD/XwzBC2B8m8DdI2sCfOXNGr7/+ulatWqXvfve7\nslgs+tKXviS/369QKKRIJKKCggIFg0FJUnFxsaqrqxUMBmWz2dTc3MwZNACQA2kDX1ZWpt/+9rez\nPrZr165Z729sbFRjY+P1jQwAcF34JCsAGIrAA4ChCDwAGIrAA4ChCDwAGIrAA4ChCDwAGIrAA4Ch\nCDwAGIrAA4ChCDwAGIrAA4ChCDwAGIrAA4ChCDwAGIrAA4ChCDwAGIrAA4ChCDwAGIrAA4ChCDwA\nGIrAA4ChCDwAGIrAA4ChCDwAGIrAA4ChCDwAGIrAA4ChCDwAGIrAA4ChCDwAGIrAA4ChCDwAGIrA\nA4ChCDwAGIrAA4ChbOkWOHLkiE6ePKn8/Hw9/fTTkqSxsTEdPHhQQ0NDcrvdCgaDstvtkqT29nZF\nIhFZrVY1NTWpoqIiu3sAAJhV2iP4DRs26KmnnrrivnA4rPLych06dEher1ft7e2SpL6+PnV2dioU\nCmnnzp169tlnlUwmszNyAMBVpQ18WVmZli5desV93d3dqq2tlSTV1dWpq6srdX9NTY2sVqvcbrcK\nCwvV29ubhWEDANK5pjn4RCIhp9MpSXI6nUokEpKkeDyuFStWpJZzuVyKx+MLMEwAwHylnYPPhMVi\nmfdzotGootFo6nYgEJDD4ViI4UCS1ZrrEQCzs1qt/F1fYG1tbanfvV6vvF6vpGsMvNPp1MjISOpn\nfn6+pItH7MPDw6nlYrGYXC7XrOu4fBCXjI6OXstwMIupKXuuhwDMampqir/rC8jhcCgQCMz6WEZT\nNMlk8oo3S6uqqtTR0SFJ6ujokM/nkyT5fD6dOHFCk5OTGhwc1MDAgEpKSq5z+ACAa5H2CP7QoUPq\n6enR6OiovvGNbygQCMjv9ysUCikSiaigoEDBYFCSVFxcrOrqagWDQdlsNjU3N1/T9A0A4PqlDfy3\nvvWtWe/ftWvXrPc3NjaqsbHx+kYFALhufJIVAAxF4AHAUAQeAAxF4AHAUAQeAAxF4AHAUAQeAAxF\n4AHAUAQeAAxF4AHAUAQeAAxF4AHAUAQeAAxF4AHAUAQeAAxF4AHAUAQeAAxF4AHAUAQeAAxF4AHA\nUAQeAAxF4AHAUAQeAAxF4AHAUAQeAAxF4AHAUAQeAAxF4AHAUAQeAAxF4AHAUAQeAAxF4AHAUAQe\nAAxF4AHAUAQeAAxly9aKT58+rV/+8pdKJpPasGGD/H5/tjYFAJhFVo7gp6en9dxzz+mpp57S/v37\ndfz4cfX392djUwCAOWQl8L29vSosLFRBQYFsNpvWrVunrq6ubGwKADCHrAQ+Ho/rjjvuSN12uVyK\nx+PZ2BQAYA5Zm4NPJxqNKhqNpm4HAgF5PJ5cDcc4DQ1SMpnrUZikXv9zP3+guDm1tbWlfvd6vfJ6\nvZKyFHiXy6Xh4eHU7Xg8LpfLdcUylw8CuNm1tbUpEAjkehjArOZ6bWZliqakpEQDAwMaGhrS5OSk\njh8/Lp/Pl41NAQDmkJUj+Ly8PG3btk179uxRMpnU5z73ORUXF2djUwCAOViSSWZqgXSi0ShTirjl\nEHgAMBRfVQAAhiLwAGAoAg8AhiLwAGAoAg8AhsrZVxUAN6v+/n51dXWlvj/J5XLJ5/PxWQ7ccjhN\nErhMOBzW8ePHtW7dutTXa8Tj8dR9XNcAtxKO4IHLRCIR7d+/XzbblX816uvr9e1vf5vA45bCHDxw\nGYvFonPnzs24/9y5c7JYLDkYEXDtmKIBLnP69Gk999xzKiwsTF3TYHh4WAMDA9q2bZvuueeeHI8Q\nyByBBz5ienpavb29V7zJWlJSorw8/sOLWwuBBwBDcUgCAIYi8ABgKAIPAIYi8ABgqP8HgDNIVh0n\np0MAAAAASUVORK5CYII=\n",
"text/plain": [
"<matplotlib.figure.Figure at 0x103fa8e10>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Quick graphical representation \n",
"\n",
"df = pd.DataFrame([sex_lines.values()],columns=sex_lines.keys())\n",
"df.plot(kind='bar')"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'Male': 5359, 'Female': 2111}\n",
"0.717402945114\n"
]
}
],
"source": [
"# Maybe men and women talk for different lengths? This counts words instead of \n",
"\n",
"sex_words = {'Male': 0,\n",
" 'Female': 0}\n",
"\n",
"for line in lines:\n",
" speaker = line.split(':')[0]\n",
" dialogue = line.split(':')[1] \n",
" # remove the \n",
" # tokenize sentence by spaces\n",
" word_count = len(dialogue.split(' ')) \n",
" \n",
" if speaker in females:\n",
" sex_words['Female'] += word_count\n",
" elif speaker not in groups:\n",
" sex_words['Male'] += word_count\n",
"\n",
"print sex_words\n",
"print sex_words['Male']/(sex_words['Male'] + sex_words['Female'])"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.axes._subplots.AxesSubplot at 0x103fa8d50>"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAX4AAAEACAYAAAC08h1NAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAGfxJREFUeJzt3Wtsm+Xdx/Gv61R0Lllcu/GWAwWtXlfhVUlaZ7TpSNIh\ngYT60IyDd3oQYSlbYaqEp2ksY1s30a1SoXXDSMqkwBhIaArS4gHaCwQ4CyIZSyjRwFUneR0VaZol\nsWuTtOmag58XXe+nXVLiQhKnuX6fN/Z95T78r8r55erl+2BLp9NpRETEGEuyXYCIiMwvBb+IiGEU\n/CIihlHwi4gYRsEvImIYBb+IiGFyMlnp9OnTPPnkk3zwwQfYbDbuv/9+CgoKOHDgAIODg3g8HoLB\nIA6HA4DW1lYikQh2u53a2lpKSkoAOHr0KE1NTYyNjVFWVkZtbe2cdUxkPkSjUXw+X7bLELksGY34\nf/vb31JWVkYoFOLRRx+lqKiIcDjMunXraGhowOfz0draCkBvby+dnZ2EQiHq6+tpbm7m/KUCzc3N\n7Nixg4aGBk6cOEFPT8/c9UxkHkSj0WyXIHLZZgz+06dPc+TIEbZs2QKA3W7H4XDQ3d1NVVUVANXV\n1XR1dQHQ3d1NRUUFdrsdj8dDQUEBsViMZDLJ6OgoXq8XgMrKSmsbERGZPzNO9QwMDJCbm0tTUxPH\njh3jc5/7HLW1taRSKZxOJwBOp5NUKgVAIpFgzZo11vYul4tEIoHdbsftdlvtbrebRCIx2/0REZEZ\nzBj8k5OT/POf/6Suro7Vq1fzzDPPEA6Hp6xns9lmrahoNHrRf6EDgcCs7VtkNumzKQtdS0uL9d7n\n8+Hz+WYOfpfLhdvtZvXq1QBs3LiRcDiM0+kkmUxar3l5edb6Q0ND1vbxeByXy4XL5SIej09pn875\n4i7U19d3GV0VmR+5ubkMDw9nuwyRaRUWFk47OJlxjt/pdOJ2u63gfffddykuLmbDhg20tbUB0NbW\nht/vB8Dv99PR0cH4+DgDAwP09/fj9XpxOp04HA5isRjpdJr29nbKy8tnsYsiIpIJWyZ353z//ff5\nzW9+w/j4OJ/5zGd44IEHmJycJBQKMTQ0RH5+PsFgkOXLlwPnTud8/fXXycnJmXI6Z2Njo3U65733\n3ptxoRrxy0KkEb8sZIWFhdO2ZxT8C4GCXxYiBb8sZJcKfl25KyJimIyu3BUR+SSuvvrqWT3zTy6W\nTqcZGRnJeH0Fv4jMOZvNpimxOZSbm3tZ62uqR0TEMAp+ERHDKPhFRAyj4BcRMYyCX0TkE+rt7aW4\nuJjJyclsl5IRndUjIllx/PhV9PXZ52z/hYUTFBX9O6N1b7jhBgYHB3n77bdZsWKF1X7zzTdz+PBh\n3nrrLYqKij5yH1fS6aoKfhHJir4+OzU1zjnbfzicZIastthsNq655hr++Mc/Wk8GPHLkCGfOnLmi\nAj1TmuoREQHuuOMOXnjhBWv5hRde4K677rKWX3vtNW655RbWrl3Ll770Jfbv33/JfQ0PD/ODH/yA\n9evX4/f72bt3Lwvp7jgKfhERYP369YyMjBCLxZicnOTFF1/k9ttvtwJ7+fLlPP744xw5coRnn32W\n5557jldeeWXafT344IMsXbqUjo4OXnnlFdrb23n++efnszsfScEvIvIf50f97e3tfP7zn+ezn/2s\n9bONGzfyhS98AYC1a9dy22230dnZOWUfg4ODRCIRfv7zn7Ns2TJcLhf33XfftA+wyhbN8YuI/Mcd\nd9zB7bffzgcffMCdd9550c8OHTrEnj17+Pvf/87Y2Bhnz55l69atU/Zx/PhxxsbGWL9+PXDuPjrp\ndHrGL4fnk4JfROQ/ioqKuOaaa4hEIuzbtw/4/7N1du7cybe//W2ef/55li5dyq5duzh58uSUfRQW\nFnLVVVfx3nvvLdgvhjXVIyJygf3799PS0sKnPvUpAGuO/9SpU+Tl5bF06VLeeeedKVM359fzeDxU\nVVWxa9cuRkZGSKfTHDt2jL/85S/z25GPoBG/iGRFYeEE4XByTvefqQtH5qtWrZr2Z7/61a/4xS9+\nwU9+8hM2btzIbbfdRiqVmnYfDQ0N/PKXv6S6uprTp0+zatUqHnjggY/blVmnJ3CJfAJ6Aldm9O80\nty7176sncImICKCpHuPM9WXyprn22jQeT7arELk8Cn7DzPVl8qZ56aVhBb9ccTTVIyJiGAW/iIhh\nFPwiIoZR8IuIGEbBLyJiGAW/iEiW7d+/n507d87b8XQ6p4hkxfHTx+k7NXdX5BcuL6TIkdkdMW+4\n4QaGhobIyckhnU5js9l444038MzjubrzeUO3jIL/e9/7Hg6HA5vNht1uZ8+ePYyMjHDgwAEGBwfx\neDwEg0EcDgcAra2tRCIR7HY7tbW1lJSUAHD06FGampoYGxujrKzMesSZiJin71QfNeGaOdt/uCac\ncfDbbDaeffZZNm/ePGf1LCQZTfXYbDZ27drF3r172bNnDwDhcJh169bR0NCAz+ejtbUVOPe0+c7O\nTkKhEPX19TQ3N1t3rWtubmbHjh00NDRw4sQJenp65qhbIiKXZ7rblr399tts27aN66+/nptvvvmi\nB6/ceeed7N27l23btrFmzRruvfdeTp48yc6dO1m7di1bt27l+PHj1vo/+9nPKC8vZ+3atdx66638\n9a9/vWQtH3Xc2ZBR8J9/kMCFuru7qaqqAqC6upquri6rvaKiArvdjsfjoaCggFgsRjKZZHR0FK/X\nC0BlZaW1jYjIQtPf388999xDMBjk8OHD/PSnP+W+++4jkUhY67z44os88cQTHDp0iPfff59t27bx\n9a9/ncOHD7N69eqLnstbVlbGq6++yuHDh6mpqeG73/0uZ8+enXLcEydOzHjcTyrjEf/u3bupr6/n\ntddeAyCVSuF0nrv03+l0WrcnTSQSrFy50trW5XKRSCRIJBK43W6r3e12z2pHREQ+ibq6Onw+Hz6f\nj+3bt/OHP/yBm266ierqagBuvPFGSkpKeP31161tvva1r3HNNddw9dVXs2XLFq699lo2b97MkiVL\n2Lp1K++995617le/+lXy8vJYsmQJ3/nOdzh79iz/+Mc/ptTR2to643E/qYzm+B955BFWrFjBhx9+\nyO7du6e91edsfjERjUaJRqPWciAQIDc3d9b2bzK77s82q2y2JfpsZsB+BXzwnn766Yvm+H/84x/z\n8ssv8+qrrwLnZj7Gx8f58pe/bK2Tn59vvV+2bNmU5VOnTlnLTz75JL///e8ZGBgAYGRkZNrBb29v\n77TH/ajvH+x2+yU/hy0tLdb783/YMgr+FStWAPDpT3+a8vJyYrEYTqeTZDJpvebl5QHnRvhDQ0PW\ntvF4HJfLhcvlIh6PT2mfzvniLqR7ec+OiQlHtktYVNLpSX02M3Al/HH87+nswsJC7rjjDvbu3fuJ\n9/3WW29x8OBBXnjhBdasWQOcy7npvlf4OMedmJiY9nOYm5tLIBCY0j7jVM+///1vzpw5A8CZM2f4\n29/+xqpVq9iwYQNtbW0AtLW14ff7AfD7/XR0dDA+Ps7AwAD9/f14vV6cTicOh4NYLEY6naa9vZ3y\n8vKMOyYiMp9uv/12Xn31Vf785z8zOTnJmTNn6OzspL+//7L3derUKXJyclixYgVnz54lFAoxMjIy\n58e9lBlH/KlUikcffRSbzcbExIQ137R69WpCoRCRSIT8/HyCwSAAxcXFbNq0iWAwSE5ODtu3b7em\ngerq6mhsbLRO5ywtLZ21jojIlaVweSHhmvDMK36C/WdquqnqwsJCnn76aXbv3s0DDzxATk4OpaWl\n1pmNlzO9XV1dTXV1NTfeeCPLly/nvvvuu+TTsWY67mzQoxcN09Xl0P34Z9FLLw2zfr2memaiRy/O\nLT16UUREPpKCX0TEMAp+ERHDKPhFRAyj4BcRMYyCX0TEMLofv4jMuXQ6fUVcvXulutyz8hX8IjLn\nLnWVqmSHpnpERAyj4BcRMYyCX0TEMAp+ERHDKPhFRAyj4BcRMYyCX0TEMAp+ERHDKPhFRAyj4BcR\nMYyCX0TEMAp+ERHDKPhFRAyj4BcRMYyCX0TEMAp+ERHDKPhFRAyj4BcRMYyCX0TEMAp+ERHDKPhF\nRAyTk+mKk5OT1NfX43K5eOihhxgZGeHAgQMMDg7i8XgIBoM4HA4AWltbiUQi2O12amtrKSkpAeDo\n0aM0NTUxNjZGWVkZtbW1c9IpERG5tIxH/H/6058oKiqylsPhMOvWraOhoQGfz0draysAvb29dHZ2\nEgqFqK+vp7m5mXQ6DUBzczM7duygoaGBEydO0NPTM8vdERGRmWQU/PF4nHfeeYebbrrJauvu7qaq\nqgqA6upqurq6rPaKigrsdjsej4eCggJisRjJZJLR0VG8Xi8AlZWV1jYiIjJ/Mgr+3/3ud9x9993Y\nbDarLZVK4XQ6AXA6naRSKQASiQQrV6601nO5XCQSCRKJBG6322p3u90kEolZ6YSIiGRuxjn+Q4cO\nkZeXx3XXXUc0Gr3kehf+UfikotHoRccKBALk5ubO2v5NZrdnu4LFxWZbos+mLGgtLS3We5/Ph8/n\nmzn4jxw5Qnd3N++88w5nz55ldHSUX//61zidTpLJpPWal5cHnBvhDw0NWdvH43FcLhcul4t4PD6l\nfTrni7vQ8PDw5fVWpjUx4ch2CYtKOj2pz6YsWLm5uQQCgSntM071fPOb3+TgwYM88cQTPPjgg3zx\ni19k586dbNiwgba2NgDa2trw+/0A+P1+Ojo6GB8fZ2BggP7+frxeL06nE4fDQSwWI51O097eTnl5\n+ez2UkREZpTx6Zz/raamhlAoRCQSIT8/n2AwCEBxcTGbNm0iGAySk5PD9u3brWmguro6GhsbrdM5\nS0tLZ6cXIiKSMVv6/LmWC1xfX1+2S1gUuroc1NQ4s13GovHSS8OsX6+pHlmYCgsLp23XlbsiIoZR\n8IuIGEbBLyJiGAW/iIhhFPwiIoZR8IuIGEbBLyJiGAW/iIhhFPwiIoZR8IuIGEbBLyJiGAW/iIhh\nFPwiIoZR8IuIGEbBLyJiGAW/iIhhFPwiIoZR8IuIGEbBLyJiGAW/iIhhFPwiIoZR8IuIGEbBLyJi\nGAW/iIhhFPwiIoZR8IuIGEbBLyJiGAW/iIhhcmZaYWxsjF27djE+Ps7ExAQbN27krrvuYmRkhAMH\nDjA4OIjH4yEYDOJwOABobW0lEolgt9upra2lpKQEgKNHj9LU1MTY2BhlZWXU1tbOaedERGSqGUf8\nS5cuZdeuXezdu5dHH32Unp4eYrEY4XCYdevW0dDQgM/no7W1FYDe3l46OzsJhULU19fT3NxMOp0G\noLm5mR07dtDQ0MCJEyfo6emZ296JiMgUGU31XHXVVcC50f/ExAQA3d3dVFVVAVBdXU1XV5fVXlFR\ngd1ux+PxUFBQQCwWI5lMMjo6itfrBaCystLaRkRE5s+MUz0Ak5OT/OhHP+Jf//oXt9xyC16vl1Qq\nhdPpBMDpdJJKpQBIJBKsWbPG2tblcpFIJLDb7bjdbqvd7XaTSCRmsy8iIpKBjIJ/yZIl7N27l9On\nT/PYY4/xwQcfTFnHZrPNWlHRaJRoNGotBwIBcnNzZ23/JrPbs13B4mKzLdFnUxa0lpYW673P58Pn\n82UW/Oc5HA6uv/56enp6cDqdJJNJ6zUvLw84N8IfGhqytonH47hcLlwuF/F4fEr7dM4Xd6Hh4eHL\nKVUuYWLCke0SFpV0elKfTVmwcnNzCQQCU9pnnOP/8MMPOX36NABnz57l3XffpaioiA0bNtDW1gZA\nW1sbfr8fAL/fT0dHB+Pj4wwMDNDf34/X68XpdOJwOIjFYqTTadrb2ykvL5/FLoqISCZmHPEnk0ka\nGxuZnJwknU5TUVHB+vXrWbNmDaFQiEgkQn5+PsFgEIDi4mI2bdpEMBgkJyeH7du3W9NAdXV1NDY2\nWqdzlpaWzm3vRERkClv6/LmWC1xfX1+2S1gUuroc1NQ4s13GovHSS8OsX6+pHlmYCgsLp23Xlbsi\nIoZR8IuIGEbBLyJiGAW/iIhhFPwiIoZR8IuIGEbBLyJiGAW/iIhhFPwiIoZR8IuIGEbBLyJiGAW/\niIhhFPwiIoZR8IuIGEbBLyJiGAW/iIhhFPwiIoZR8IuIGEbBLyJiGAW/iIhhFPwiIoZR8IuIGEbB\nLyJiGAW/iIhhFPwiIoZR8IuIGEbBLyJiGAW/iIhhcmZaIR6P88QTT5BKpbDZbNx0003ceuutjIyM\ncODAAQYHB/F4PASDQRwOBwCtra1EIhHsdju1tbWUlJQAcPToUZqamhgbG6OsrIza2to57ZzIXLOv\n/Addg8eyXcaiUbi8kCJHUbbLWPRmDH673c4999zDddddx5kzZ3jooYcoKSkhEomwbt06tm3bRjgc\nprW1lW9961v09vbS2dlJKBQiHo/zyCOP8Pjjj2Oz2WhubmbHjh14vV727NlDT08PpaWl89FPkTkR\nH+/l7nBNtstYNMI1YQX/PJhxqsfpdHLdddcBsGzZMoqKiojH43R3d1NVVQVAdXU1XV1dAHR3d1NR\nUYHdbsfj8VBQUEAsFiOZTDI6OorX6wWgsrLS2kZERObPZc3xDwwMcOzYMdasWUMqlcLpdALn/jik\nUikAEokEK1eutLZxuVwkEgkSiQRut9tqd7vdJBKJ2eiDiIhchhmnes47c+YM+/fvp7a2lmXLlk35\nuc1mm7WiotEo0WjUWg4EAuTm5s7a/k1mt2e7ApFLs9vt+l2fZS0tLdZ7n8+Hz+fLLPgnJibYt28f\nlZWVlJeXA+dG+clk0nrNy8sDzo3wh4aGrG3j8TgulwuXy0U8Hp/SPp3zxV1oeHg4w27KR5mYcGS7\nBJFLmpiY0O/6LMrNzSUQCExpz2iq5+DBgxQXF3PrrbdabRs2bKCtrQ2AtrY2/H4/AH6/n46ODsbH\nxxkYGKC/vx+v14vT6cThcBCLxUin07S3t1t/REREZP7MOOI/cuQIb7zxBqtWreKHP/whNpuNb3zj\nG9TU1BAKhYhEIuTn5xMMBgEoLi5m06ZNBINBcnJy2L59uzUNVFdXR2Njo3U6p87oERGZf7Z0Op3O\ndhGZ6Ovry3YJi0JXl4OaGme2y1g0nnvjZe5+7X+yXcaiEa4JU56vmYDZUlhYOG27rtwVETGMgl9E\nxDAKfhERwyj4RUQMo+AXETGMgl9ExDAKfhERwyj4RUQMo+AXETGMgl9ExDAKfhERwyj4RUQMo+AX\nETGMgl9ExDAKfhERwyj4RUQMo+AXETGMgl9ExDAKfhERwyj4RUQMo+AXETGMgl9ExDAKfhERwyj4\nRUQMo+AXETGMgl9ExDAKfhERwyj4RUQMkzPTCgcPHuTQoUPk5eXx2GOPATAyMsKBAwcYHBzE4/EQ\nDAZxOBwAtLa2EolEsNvt1NbWUlJSAsDRo0dpampibGyMsrIyamtr565XIiJySTOO+Lds2cLDDz98\nUVs4HGbdunU0NDTg8/lobW0FoLe3l87OTkKhEPX19TQ3N5NOpwFobm5mx44dNDQ0cOLECXp6euag\nOyIiMpMZg3/t2rUsX778orbu7m6qqqoAqK6upqury2qvqKjAbrfj8XgoKCggFouRTCYZHR3F6/UC\nUFlZaW0jIiLz62PN8adSKZxOJwBOp5NUKgVAIpFg5cqV1noul4tEIkEikcDtdlvtbrebRCLxSeoW\nEZGPacY5/kzYbLbZ2I0lGo0SjUat5UAgQG5u7qwew1R2e7YrELk0u92u3/VZ1tLSYr33+Xz4fL6P\nF/xOp5NkMmm95uXlAedG+ENDQ9Z68Xgcl8uFy+UiHo9Pab+U88VdaHh4+OOUKv9lYsKR7RJELmli\nYkK/67MoNzeXQCAwpT2jqZ50Om19SQuwYcMG2traAGhra8Pv9wPg9/vp6OhgfHycgYEB+vv78Xq9\nOJ1OHA4HsViMdDpNe3s75eXls9AtERG5XDOO+BsaGjh8+DDDw8Pcf//9BAIBampqCIVCRCIR8vPz\nCQaDABQXF7Np0yaCwSA5OTls377dmgaqq6ujsbHROp2ztLR0bnsmIiLTsqUvHMovYH19fdkuYVHo\n6nJQU+PMdhmLxnNvvMzdr/1PtstYNMI1YcrzNRswWwoLC6dt15W7IiKGUfCLiBhGwS8iYhgFv4iI\nYRT8IiKGUfCLiBhGwS8iYhgFv4iIYRT8IiKGUfCLiBhGwS8iYhgFv4iIYRT8IiKGUfCLiBhGwS8i\nYhgFv4iIYRT8IiKGUfCLiBhGwS8iYhgFv4iIYRT8IiKGUfCLiBhGwS8iYhgFv4iIYRT8IiKGUfCL\niBhGwS8iYhgFv4iIYXLm+4A9PT0888wzpNNptmzZQk1NzXyXICJitHkd8U9OTvLUU0/x8MMPs2/f\nPt58802OHz8+nyWIiBhvXoM/FotRUFBAfn4+OTk5bN68ma6urvksQUTEePMa/IlEArfbbS27XC4S\nicR8liAiYrx5n+PPRDQaJRqNWsuBQIDCwsIsVrR4bNsG6XS2q1hMtvK/X9Y/qCxcLS0t1nufz4fP\n55vf4He5XAwNDVnLiUQCl8s1Zb3zxYksdC0tLQQCgWyXIXJJ030+53Wqx+v10t/fz+DgIOPj47z5\n5pv4/f75LEFExHjzOuJfsmQJdXV17N69m3Q6zVe+8hWKi4vnswQREePZ0mnN+Ip8XNFoVNOScsVR\n8IuIGEa3bBARMYyCX0TEMAp+ERHDKPhFRAyj4BcRMcyCvGWDyEJ1/Phxurq6rHtMuVwu/H6/rkeR\nK4pO5xTJUDgc5s0332Tz5s3WrUYSiYTVpmdLyJVCI36RDEUiEfbt20dOzsW/Nlu3buX73/++gl+u\nGJrjF8mQzWbj5MmTU9pPnjyJzWbLQkUiH4+mekQy1NPTw1NPPUVBQYH1XImhoSH6+/upq6ujtLQ0\nyxWKZEbBL3IZJicnicViF3256/V6WbJE/3mWK4eCX0TEMBqmiIgYRsEvImIYBb+IiGEU/CIihvk/\n7Yd5MnWyGbkAAAAASUVORK5CYII=\n",
"text/plain": [
"<matplotlib.figure.Figure at 0x103fa8c10>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Quick graphical representation \n",
"\n",
"df = pd.DataFrame([sex_words.values()],columns=sex_words.keys())\n",
"df.plot(kind='bar')"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# Bonus toy story analysis\n",
"\n",
"url = 'http://www.dailyscript.com/scripts/toy_story.html'\n",
"toy_story_script = requests.get(url).text"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# toy_story_script.splitlines()[250:350]\n",
"\n",
"lines = []\n",
"speaker = ''\n",
"dialogue = ''\n",
"for row in toy_story_script.splitlines()[90:]:\n",
" if ' ' in row: \n",
" if ':' not in speaker:\n",
" lines.append( {'Speaker': remove_paren(speaker).strip(),\n",
" 'Dialogue': remove_paren(dialogue).strip() } )\n",
" \n",
" speaker = remove_spaces(row.strip())\n",
" dialogue = ''\n",
" elif ' ' in row:\n",
" dialogue = dialogue + ' ' + remove_spaces(row)\n",
"lines.append( {'Speaker': remove_paren(speaker).strip(),\n",
" 'Dialogue': remove_paren(dialogue).strip() } )"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"roles = defaultdict(int)\n",
"\n",
"for line in lines:\n",
" speaker = line['Speaker']\n",
" roles[speaker] = roles[speaker] + 1"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Dialogue</th>\n",
" <th>Speaker</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Alright everyone, this is a stick- up! Don't ...</td>\n",
" <td>ANDY</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Ooh! Money. Money. Money.</td>\n",
" <td>ANDY</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Stop it! Stop it, you mean old potato!</td>\n",
" <td>ANDY</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Quiet Bo Peep, or your sheep get run over!</td>\n",
" <td>ANDY</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Heeeeelp! BAAAAA! Heeeelp us!</td>\n",
" <td>ANDY</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Dialogue Speaker\n",
"0 Alright everyone, this is a stick- up! Don't ... ANDY\n",
"1 Ooh! Money. Money. Money. ANDY\n",
"2 Stop it! Stop it, you mean old potato! ANDY\n",
"3 Quiet Bo Peep, or your sheep get run over! ANDY\n",
"4 Heeeeelp! BAAAAA! Heeeelp us! ANDY"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"toy_story_df = pd.DataFrame(lines[1:])\n",
"toy_story_df.head()"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"WOODY 290\n",
"BUZZ 145\n",
"SID 61\n",
"ANDY 58\n",
"REX 52\n",
"MR. POTATO HEAD 52\n",
"SLINKY 34\n",
"HAMM 33\n",
"MRS. DAVIS 30\n",
"SARGENT 25\n",
"HANNAH 23\n",
"BO PEEP 23\n",
"TV ANNOUNCER 8\n",
"PIZZA DELIVERER 5\n",
"TOYS 5\n",
"LENNY 5\n",
"ALIENS 4\n",
"KID #1 3\n",
"ATTENDANT 3\n",
"SID'S MOM 2\n",
"KID #2 2\n",
"RC CAR 2\n",
"SPACE COMMANDER 2\n",
"ALIEN #4 2\n",
"ALIEN #2 2\n",
"ALIEN #1 2\n",
"ALIEN #3 2\n",
"TV BUZZ 2\n",
"MR. SPELL 2\n",
"ROBOT GUARDS 2\n",
"MALE CHORUS 2\n",
"ALIEN 1\n",
"THE END 1\n",
"FEMALE VOICE OVER SPEAKER 1\n",
"FRIEND #1 1\n",
"SHARK 1\n",
"MALE VOICE OVER SPEAKER 1\n",
"ALIEN #5 1\n",
"FRIEND #2 1\n",
"MIKE 1\n",
"FADE OUT. 1\n",
"SLINKY & REX 1\n",
"ROBOT 1\n",
"LOCAL ANNOUNCER 1\n",
"MOM 1\n",
"WOUNDED SOLDIER 1\n",
"Name: Speaker, dtype: int64"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"toy_story_df.Speaker.value_counts()"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"Sex\n",
"Female 78\n",
"Male 820\n",
"dtype: int64"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXgAAAExCAYAAAB2yrkCAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAH6ZJREFUeJzt3X9sG3f9x/Gne2laPLy4bmJImk3VGkXbrDSMOFOTwZJS\nfk4VsQTzBAWRsSDYEAIjjVENVKQOBmzDadYVEIVt4neQZiOmbUgwW6xdEDbd1OLRQRCMpm5IbCuu\ns9Bkcfz9I996LUmXbG166SevhzTFvvNd3p+b/eonn7vzx1EqlUqIiIhxVtldgIiILA0FvIiIoRTw\nIiKGUsCLiBhKAS8iYigFvIiIoSoW86InnniC3//+9wBs27aNm266ifHxcXp7exkdHcXr9RIKhXA6\nnQBEIhFisRiWZdHd3U1zc/PStUBEROa1YA/+2LFjPP3003zzm9/kvvvu49ChQwwPDxONRmlqamLP\nnj34fD4ikQgAQ0NDDAwMEA6H2blzJ/v370eX2l98qVTK7hJE5qX35sWzYMAfP36choYGVq9ezapV\nq7jmmmv405/+xJ///Gc6OjoA6OzsJJFIAJBMJmlvb8eyLLxeL7W1tQwODi5tK2QOfYhkudJ78+JZ\nMOCvuOIKjh49yvj4OJOTkzz33HNkMhnGxsZwu90AuN1u8vk8ALlcjurq6vL2Ho+HXC63ROWLiMi5\nLDgGv2HDBrq6urjnnntYu3YtGzduZNWquf8uOByOJSlQRETemEWdZN26dStbt24F4Oc//znr16/H\n7XaXe/FjY2NUVVUBsz32TCZT3jabzeLxeObsM5VKnfWnWjAYPK+GyNl0PGW50nvzwuvv7y8/9vl8\n+Hw+YJEBf/LkSS6//HIymQx/+tOf+PrXv87IyAjxeJxAIEA8Hsfv9wPg9/vp6+tj+/bt5HI5hoeH\naWhomLPPM4s4LZ1Ov+EGytlcLheFQsHuMkTm0HvzwqqrqzvnP5qLCvgHHniA8fFxLMuip6cHp9NJ\nIBAgHA4Ti8WoqakhFAoBUF9fT1tbG6FQiIqKCnp6ejR8IyJiA8dy+rpg9eAvHPWSZLnSe/PCqqur\nO+c63ckqImIoBbyIiKEU8CIihlLAi4gYSgEvImIoBbyIiKEU8CIihlLAi4gYSgEvImIoBbyIiKEU\n8CIihlLAi4gYSgEvImIoBbyIiKEU8CIihlLAi4gYalEzOj3++OPEYjEcDgdXXnkld9xxB6dOnaK3\nt5fR0VG8Xi+hUAin0wlAJBIhFothWRbd3d00NzcvaSNEZNbx42tIpy27y3hNlgXFotPuMhZUV1dk\nw4ZJu8s4LwsGfC6X46mnnqK3t5eKigrC4TAHDhxgaGiIpqYmurq6iEajRCIRduzYwdDQEAMDA4TD\nYbLZLLt376avr0/T9olcBOm0RSDgtrsMI0SjY2zYYHcV52dRQzQzMzOcOnWKYrHI1NQUHo+HZDJJ\nR0cHAJ2dnSQSCQCSySTt7e1YloXX66W2tpbBwcGla4GIiMxrwR68x+Nh+/bt3HHHHaxZs4bNmzez\nefNm8vk8bvdsT8HtdpPP54HZHn9jY+NZ2+dyuSUqX0REzmXBHvzLL79MMplk3759fP/732dycpJn\nnnlmzus0BCMisrws2IM/cuQIXq+XN7/5zQBcf/31vPjii7jdbsbGxso/q6qqgNkeeyaTKW+fzWbx\neDxz9ptKpUilUuXnwWAQl8t13g2SWZWVlTqeK5C1vM+vXlIsy7pkPkP9/f3lxz6fD5/PBywi4Kur\nq/n73//O1NQUq1ev5siRI2zatIm1a9cSj8cJBALE43H8fj8Afr+fvr4+tm/fTi6XY3h4mIaGhjn7\nPbOI0wqFwnk1Ul7lcrl0PFegS+HqlEtFsVikUJiwu4wFuVwugsHgvOsWDPiGhga2bNnCXXfdhWVZ\nbNy4kXe/+92cOnWKcDhMLBajpqaGUCgEQH19PW1tbYRCISoqKujp6dHwjYiIDRylUqlkdxGnpdNp\nu0swhnrwK1Mi4dRlkhdINDpGa+vy78HX1dWdc53uZBURMZQCXkTEUAp4ERFDKeBFRAylgBcRMZQC\nXkTEUAp4ERFDKeBFRAylgBcRMZQCXkTEUAp4ERFDKeBFRAylgBcRMZQCXkTEUAp4ERFDKeBFRAy1\n4IxO6XSa3t5eHA4HpVKJ//znP9xyyy3ceOON9Pb2Mjo6itfrJRQK4XTOThcWiUSIxWJYlkV3dzfN\nzc1L3hARETnbggFfV1fHt7/9bQBmZma4/fbbuf7664lGozQ1NdHV1UU0GiUSibBjxw6GhoYYGBgg\nHA6TzWbZvXs3fX19mrZPROQie11DNEeOHOEtb3kL1dXVJJNJOjo6AOjs7CSRSACQTCZpb2/Hsiy8\nXi+1tbUMDg5e+MpFROQ1va6Af/bZZ3nHO94BQD6fx+2enfvR7XaTz+cByOVyVFdXl7fxeDzkcrkL\nVa+IiCzSogN+enqaZDLJli1b5l2vIRgRkeVlwTH4055//nmuuuoqLr/8cmC21z42Nlb+WVVVBcz2\n2DOZTHm7bDaLx+OZs79UKkUqlSo/DwaDuFyuN9wQOVtlZaWO5wpkWXZXYA7Lsi6Zz1B/f3/5sc/n\nw+fzAa8j4A8cOMANN9xQft7S0kI8HicQCBCPx/H7/QD4/X76+vrYvn07uVyO4eFhGhoa5uzvzCJO\nKxQKr69Vck4ul0vHcwUqFp12l2CMYrFIoTBhdxkLcrlcBIPBedctKuAnJyc5cuQIn/70p8vLAoEA\n4XCYWCxGTU0NoVAIgPr6etra2giFQlRUVNDT06PhGxERGzhKpVLJ7iJOS6fTdpdgDPXgV6ZEwkkg\n4La7DCNEo2O0ti7/HnxdXd051+lOVhERQyngRUQMpYAXETGUAl5ExFAKeBERQyngRUQMpYAXETGU\nAl5ExFAKeBERQyngRUQMpYAXETGUAl5ExFAKeBERQyngRUQMpYAXETGUAl5ExFCLmtFpYmKC733v\nexw7dgyHw8Htt99ObW0tvb29jI6O4vV6CYVCOJ2z04VFIhFisRiWZdHd3U1zc/OSNkJEROZaVMA/\n/PDDXHfddXzxi1+kWCwyOTnJY489RlNTE11dXUSjUSKRCDt27GBoaIiBgQHC4TDZbJbdu3fT19en\naftERC6yBYdoJiYmOHr0KFu3bgVmZxp3Op0kk0k6OjoA6OzsJJFIAJBMJmlvb8eyLLxeL7W1tQwO\nDi5hE0REZD4L9uBHRkZwuVzs27ePl156iauuuoru7m7y+Txu9+zcj263m3w+D0Aul6OxsbG8vcfj\nIZfLLVH5IiJyLgsG/MzMDP/85z+57bbb2LRpE4888gjRaHTO617vEEwqlSKVSpWfB4NBXC7X69qH\nnFtlZaWO5wpkWXZXYA7Lsi6Zz1B/f3/5sc/nw+fzAYsIeI/Hw/r169m0aRMAW7ZsIRqN4na7GRsb\nK/+sqqoqvz6TyZS3z2azeDyeOfs9s4jTCoXCG2iazMflcul4rkDFotPuEoxRLBYpFCbsLmNBLpeL\nYDA477oFx+Ddbjfr168nnU4DcOTIEerr62lpaSEejwMQj8fx+/0A+P1+nn32WaanpxkZGWF4eJiG\nhoYL1BQREVmsRV1Fc+utt/Lggw8yPT3NW97yFu644w5mZmYIh8PEYjFqamoIhUIA1NfX09bWRigU\noqKigp6eHl1BIyJiA0epVCrZXcRpp/9KkPOnIZqVKZFwEgi47S7DCNHoGK2ty3+Ipq6u7pzrdCer\niIihFPAiIoZSwIuIGEoBLyJiKAW8iIihFPAiIoZSwIuIGEoBLyJiKAW8iIihFPAiIoZSwIuIGEoB\nLyJiKAW8iIihFPAiIoZSwIuIGGpRE3589rOfxel04nA4sCyLe++9l/HxcXp7exkdHcXr9RIKhXA6\nZ6cLi0QixGIxLMuiu7ub5ubmJW2EiIjMtaiAdzgc7Nq1ize/+c3lZdFolKamJrq6uohGo0QiEXbs\n2MHQ0BADAwOEw2Gy2Sy7d++mr69PszqJiFxkixqiKZVK/O/ET8lkko6ODgA6OztJJBLl5e3t7ViW\nhdfrpba2lsHBwQtctoiILGTRPfh77rmHVatW8e53v5tt27aRz+dxu2enBnO73eTzeQByuRyNjY3l\nbT0eD7lcbglKFxGR17KogN+9ezfr1q3j5MmT3HPPPfPOAaghGBGR5WVRAb9u3ToALr/8clpbWxkc\nHMTtdjM2Nlb+WVVVBcz22DOZTHnbbDaLx+OZs89UKkUqlSo/DwaDuFyu82qMvKqyslLHcwWyLLsr\nMIdlWZfMZ6i/v7/82Ofz4fP5gEUE/OTkJKVSibVr13Lq1CkOHz7Mhz/8YVpaWojH4wQCAeLxOH6/\nHwC/309fXx/bt28nl8sxPDxMQ0PDnP2eWcRphULhvBopr3K5XDqeK1Cx6LS7BGMUi0UKhQm7y1iQ\ny+UiGAzOu27BgM/n89x33304HA6KxSLvfOc7aW5uZtOmTYTDYWKxGDU1NYRCIQDq6+tpa2sjFApR\nUVFBT0+Phm9ERGzgKP3v5TE2SqfTdpdgDPXgV6ZEwkkg4La7DCNEo2O0ti7/Hvx850RP052sIiKG\nUsCLiBhKAS8iYigFvIiIoRTwIiKGUsCLiBhKAS8iYigFvIiIoRTwIiKGUsCLiBhKAS8iYigFvIiI\noRTwIiKGUsCLiBhKAS8iYigFvIiIoRY1JyvAzMwMO3fuxOPxcNdddzE+Pk5vby+jo6N4vV5CoRBO\n5+x0YZFIhFgshmVZdHd309zcvGQNEBGR+S26B//EE0+wYcOG8vNoNEpTUxN79uzB5/MRiUQAGBoa\nYmBggHA4zM6dO9m/fz/LaNIoEZEVY1EBn81mee6559i2bVt5WTKZpKOjA4DOzk4SiUR5eXt7O5Zl\n4fV6qa2tZXBwcAlKFxGR17KogH/00Uf5+Mc/ftbk2fl8Hrd7du5Ht9tNPp8HIJfLUV1dXX6dx+Mh\nl8tdyJpFRGQRFgz4Q4cOUVVVxcaNG19zqOXM8BcREfsteJL16NGjJJNJnnvuOaampvjvf//Lgw8+\niNvtZmxsrPyzqqoKmO2xZzKZ8vbZbBaPxzNnv6lUilQqVX4eDAZxuVwXok0CVFZW6niuQJZldwXm\nsCzrkvkM9ff3lx/7fD58Ph+wiID/6Ec/ykc/+lEAXnjhBX7zm9/wuc99jp/85CfE43ECgQDxeBy/\n3w+A3++nr6+P7du3k8vlGB4epqGhYc5+zyzitEKh8MZbKGdxuVw6nitQsei0uwRjFItFCoUJu8tY\nkMvlIhgMzrtu0ZdJ/q9AIEA4HCYWi1FTU0MoFAKgvr6etrY2QqEQFRUV9PT0aPhGRMQGjtIyuoYx\nnU7bXYIx1INfmRIJJ4GA2+4yjBCNjtHauvx78HV1dedcpztZRUQMpYAXETGUAl5ExFAKeBERQyng\nRUQMpYAXETGUAl5ExFAKeBERQyngRUQMpYAXETGUAl5ExFAKeBERQyngRUQMpYAXETGUAl5ExFAK\neBERQy04o9Mrr7zCrl27mJ6eplgssmXLFm6++WbGx8fp7e1ldHQUr9dLKBTC6ZydLiwSiRCLxbAs\ni+7ubpqbm5e8ISIicrYFA3716tXs2rWLNWvWMDMzw1e/+lWuu+46/vjHP9LU1ERXVxfRaJRIJMKO\nHTsYGhpiYGCAcDhMNptl9+7d9PX1ado+EZGLbFFDNGvWrAFme/PFYhGAZDJJR0cHAJ2dnSQSifLy\n9vZ2LMvC6/VSW1vL4ODgUtQuIiKvYVGTbs/MzPDlL3+Z//znP7zvfe+joaGBfD6P2z0796Pb7Saf\nzwOQy+VobGwsb+vxeMjlcktQuoiIvJZFBfyqVav49re/zcTEBPfffz/Hjh2b8xoNwYiILC+LCvjT\nnE4n1157Lc8//zxut5uxsbHyz6qqKmC2x57JZMrbZLNZPB7PnH2lUilSqVT5eTAYxOVyvdF2yP+o\nrKzU8VyBLMvuCsxhWdYl8xnq7+8vP/b5fPh8PmARAX/y5EkqKipwOp1MTU1x5MgRurq6aGlpIR6P\nEwgEiMfj+P1+APx+P319fWzfvp1cLsfw8DANDQ1z9ntmEacVCoXzaqS8yuVy6XiuQMWi0+4SjFEs\nFikUJuwuY0Eul4tgMDjvugUDfmxsjIceeoiZmRlKpRLt7e28/e1vp7GxkXA4TCwWo6amhlAoBEB9\nfT1tbW2EQiEqKiro6enR8I2IiA0cpVKpZHcRp6XTabtLMIZ68CtTIuEkEHDbXYYRotExWluXfw++\nrq7unOt0J6uIiKEU8CIihlLAi4gYSgEvImIoBbyIiKEU8CIihlLAi4gYSgEvImIoBbyIiKEU8CIi\nhlLAi4gYSgEvImIoBbyIiKEU8CIihlLAi4gYSgEvImKoBWd0ymaz7N27l3w+j8PhYNu2bdx0002M\nj4/T29vL6OgoXq+XUCiE0zk7XVgkEiEWi2FZFt3d3TQ3Ny95Q0RE5GwLBrxlWXziE59g48aNnDp1\nirvuuovm5mZisRhNTU10dXURjUaJRCLs2LGDoaEhBgYGCIfDZLNZdu/eTV9fn6btExG5yBYconG7\n3WzcuBGAtWvXsmHDBrLZLMlkko6ODgA6OztJJBIAJJNJ2tvbsSwLr9dLbW0tg4ODS9cCERGZ1+sa\ngx8ZGeGll16isbGRfD6P2z0796Pb7SafzwOQy+Worq4ub+PxeMjlchewZBERWYxFB/ypU6f4zne+\nQ3d3N2vXrp2zXkMwIiLLy4Jj8ADFYpEHHniAG2+8kdbWVmC21z42Nlb+WVVVBcz22DOZTHnbbDaL\nx+OZs89UKkUqlSo/DwaDuFyu82qMvKqyslLHcwWyLLsrMIdlWZfMZ6i/v7/82Ofz4fP5gEUG/He/\n+13q6+u56aabystaWlqIx+MEAgHi8Th+vx8Av99PX18f27dvJ5fLMTw8TENDw5x9nlnEaYVC4fW3\nTOblcrl0PFegYtFpdwnGKBaLFAoTdpexIJfLRTAYnHfdggF/9OhRnnnmGa688kq+9KUv4XA4+MhH\nPkIgECAcDhOLxaipqSEUCgFQX19PW1sboVCIiooKenp6NHwjImIDR6lUKtldxGnpdNruEoyhHvzK\nlEg4CQTcdpdhhGh0jNbW5d+Dr6urO+c63ckqImIoBbyIiKEU8CIihlLAi4gYSgEvImIoBbyIiKEU\n8CIihlLAi4gYSgEvImIoBbyIiKEU8CIihlLAi4gYSgEvImIoBbyIiKEU8CIihlLAi4gYasEZnb77\n3e9y6NAhqqqquP/++wEYHx+nt7eX0dFRvF4voVAIp3N2qrBIJEIsFsOyLLq7u2lubl7aFoiIyLwW\n7MFv3bqVu++++6xl0WiUpqYm9uzZg8/nIxKJADA0NMTAwADhcJidO3eyf/9+ltGEUSIiK8qCAX/1\n1Vdz2WWXnbUsmUzS0dEBQGdnJ4lEory8vb0dy7Lwer3U1tYyODi4BGWLiMhC3tAYfD6fx+2enffR\n7XaTz+cByOVyVFdXl1/n8XjI5XIXoEwREXm9FhyDXwyHw/G6t0mlUqRSqfLzYDCIy+W6EOUIUFlZ\nqeO5AlmW3RWYw7KsS+Yz1N/fX37s8/nw+XzAGwx4t9vN2NhY+WdVVRUw22PPZDLl12WzWTwez7z7\nOLOI0wqFwhspR+bhcrl0PFegYtFpdwnGKBaLFAoTdpexIJfLRTAYnHfdooZoSqXSWSdLW1paiMfj\nAMTjcfx+PwB+v59nn32W6elpRkZGGB4epqGh4TzLFxGRN2LBHvyePXt44YUXKBQK3H777QSDQQKB\nAOFwmFgsRk1NDaFQCID6+nra2toIhUJUVFTQ09PzhoZvRETk/DlKy+g6xnQ6bXcJxtAQzcqUSDgJ\nBNx2l2GEaHSM1tblP0RTV1d3znW6k1VExFAX5CqaleT48TWk08v/UgXLujROuNXVFdmwYdLuMkSM\npIB/ndJpS38CX0DR6BgbNthdhYiZNEQjImIoBbyIiKEU8CIihlLAi4gYSgEvImIoBbyIiKEU8CIi\nhlLAi4gYSgEvImIoBbyIiKEU8CIihlLAi4gYasm+bOz555/nkUceoVQqsXXrVgKBwFL9KhERmceS\n9OBnZmb44Q9/yN13380DDzzAwYMHOX78+FL8KhEROYclCfjBwUFqa2upqamhoqKCG264gUQisRS/\nSkREzmFJAj6Xy7F+/fryc4/HQy6XW4pfJSIi56CTrCIihlqSk6wej4dMJlN+nsvl8Hg8Z70mlUqR\nSqXKz4PB4GtOHrtcdHXB8pmm3ATu//9PLgS9Py+kS+e92d/fX37s8/nw+XzAEgV8Q0MDw8PDjI6O\nsm7dOg4ePMjnP//5s15zZhFy4fX39xMMBu0uQ2QOvTcvvHMdzyUJ+FWrVnHbbbdxzz33UCqVeNe7\n3kV9ff1S/CoRETmHJbsO/m1vext79uxZqt2LiMgCdJLVUBr+kuVK782Lx1Eq6ZSMiIiJ1IMXETGU\nAl5ExFAKeBERQyngRUQMpYA30OTkpN0liMwxNTVFOp22u4wVRQFvkBdffJFQKMQXvvAFAP71r3+x\nf/9+m6sSgWQyyZ133snXv/51YPa9+a1vfcvmqsyngDfIo48+yt13343L5QJg48aN/PWvf7W5KhH4\n1a9+xb333stll10GzL43R0ZGbK7KfAp4w1RXV5/1fNUq/S8W+1VUVOB0Os9a5nA4bKpm5ViyryqQ\ni2/9+vW8+OKLOBwOpqeneeKJJ9iwYYPdZYlQX1/PgQMHmJmZ4cSJEzz55JM0NjbaXZbxdCerQU6e\nPMkjjzzCkSNHKJVKbN68mVtvvbU8ZCNil8nJSR577DEOHz5MqVSiubmZD33oQ1RWVtpdmtEU8CIi\nhtIQjQF+9KMfveb6T37ykxepEpGzffOb33zNsfa77rrrIlaz8ijgDXDVVVfZXYLIvD74wQ/aXcKK\npiEaERFDqQdvkJMnTxKNRjl+/DhTU1Pl5bt27bKxKhE4ceIEP/vZzxgaGuKVV14pL9+7d6+NVZlP\nF0kbpK+vj/r6ekZGRrj55pupqalh06ZNdpclwr59+3jve9+LZVns2rWLG2+8kXe+8512l2U8BbxB\nCoUC73rXu7Asi2uvvZY77riDVCpld1kiTE1N0dTURKlUoqamhmAwyKFDh+wuy3gaojFIRcXs/851\n69Zx6NAh1q1bx/j4uM1VicDq1auZmZmhtraWp556Co/Hw6lTp+wuy3g6yWqQP//5z1xzzTVkMhke\nfvhhJiYmuPnmm/H7/XaXJivc4OAg9fX1vPzyy/zyl79kYmKCD37wg7qbdYkp4EVEDKUhGoOMjIzw\n5JNPMjo6SrFYLC/XzSRil4W+EljvzaWlgDfIfffdx9atW2lpadG3SMqy8Le//Y3q6mpuuOEGGhoa\n7C5nxVHAG2T16tXcdNNNdpchUvaDH/yAw4cPc+DAAQ4cOMDb3/52brjhBq644gq7S1sRNAZvkAMH\nDnDixAmam5vLV9SAvspAlodXXnmFgwcP8uMf/5ibb76Z97///XaXZDz14A3y73//mz/84Q/85S9/\nOWuIRneyip1eeeUVDh06xMGDBxkdHeUDH/gA119/vd1lrQgKeIMMDAywd+/es3rvInbau3cvx44d\n47rrruPDH/4wV155pd0lrShKAoNcccUVvPzyy1RVVdldiggAzzzzDGvWrCnP4nRaqVTC4XDw6KOP\n2lid+TQGb5Cvfe1rvPTSSzQ0NJzVi9elaCIrkwLeIC+88MK8y6+99tqLXImILAcKeMOMjo5y4sQJ\nNm/ezOTkJDMzM7zpTW+yuywRsYHuhjHI7373O77zne/wgx/8AIBcLsd9991nc1UiYhcFvEF++9vf\nsnv37nKPvba2lnw+b3NVImIXBbxBVq9efdbJ1WKx+JoTHouI2XSZpEGuvfZaHnvsMaampjh8+DC/\n/e1vaWlpsbssEbGJTrIaZGZmhqeffprDhw9TKpVobm5m27Zt6sWLrFAKeANkMhmqq6vtLkNElhmN\nwRvgzCtl7r//fhsrEZHlRAFvgDP/CBsZGbGxEhFZThTwBjhzjF3j7SJymsbgDXDLLbewdu1aSqUS\nU1NTrFmzBtAXOomsdAp4ERFDaYhGRMRQCngREUMp4EVEDKWAFxExlL6LRla0o0eP8tOf/pRjx45h\nWRYbNmygu7ubq666yu7SRM6bAl5WrP/+979861vf4lOf+hRtbW1MT0/z17/+VZOWizE0RCMr1okT\nJwBob2/H4XCwevVqNm/ezJVXXgnA008/TSgU4pOf/CTf+MY3yGQyAPztb3/jtttuI5fLAfCvf/2L\nW2+9lXQ6bU9DRM5BAS8rVm1tLatWreKhhx7i+eef5+WXXy6vSyQS/PrXv+bOO+9k//79XH311ezZ\nsweAxsZG3vOe9/DQQw8xNTXF3r17+chHPkJdXZ1dTRGZl250khUtnU4TjUY5cuQIY2NjXHfddXz6\n059m3759bNmyha1btwKzX8X8iU98gnA4THV1NcVikbvvvpvp6WnWr1/Pzp07bW6JyFwKeJH/l06n\nefDBB3nrW9/KSy+9RDabZdWqV//InZ6e5qtf/SqNjY0APPXUUzz88MN85Stfoampya6yRc5JAS9y\nhqeeeorf/e53eDwebrzxRt7xjnfM+7pcLsedd95Ja2sr//jHP7j33nt1claWHY3By4qVTqd5/PHH\nyydLM5kMBw8eLI+xRyIRhoaGAJiYmOCPf/xjedt9+/axbds2PvOZz7Bu3Tp+8Ytf2NIGkdeiLoes\nWGvXruXvf/87jz/+OBMTE1x22WW0tLTwsY99jLVr13Lq1Cl6e3vJZDI4nU42b97Mli1beOKJJzh5\n8iS33HILALfffjtf+tKX8Pv9XH311Ta3SuRVGqIRETGUhmhERAylgBcRMZQCXkTEUAp4ERFDKeBF\nRAylgBcRMZQCXkTEUAp4ERFDKeBFRAz1f7XGrfqdHxO3AAAAAElFTkSuQmCC\n",
"text/plain": [
"<matplotlib.figure.Figure at 0x106b5b950>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"def what_sex(speaker):\n",
" if speaker in [\"SID'S MOM\", 'MRS. DAVIS', 'HANNAH', 'BO PEEP']:\n",
" return 'Female'\n",
" return 'Male'\n",
"\n",
"toy_story_df['Sex'] = toy_story_df['Speaker'].apply(what_sex)\n",
"\n",
"sex_df = toy_story_df.groupby('Sex').size()\n",
"sex_df.plot(kind='bar')\n",
"sex_df\n"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"Sex\n",
"Female 509\n",
"Male 6270\n",
"Name: Word Count, dtype: int64"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAX4AAAExCAYAAAB71MlFAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAHnpJREFUeJzt3X9sVfX9x/Hn5ZQfu+7ay71tXUs1Bm4a5w2USrtA1QLy\n3RyGaJPpNbotXi2b6LZs9w91xC0s8dcUtS3SusRuanTL1iX2LjP+SIxtEGRJO2zAq6h3TAbcIu09\nuZdiK7S35/sH4w7WuhYtvaWf1yMh955Pz+G+P4fTVz98zo+6HMdxEBERY8zKdQEiIjK1FPwiIoZR\n8IuIGEbBLyJiGAW/iIhhFPwiIobJG2+FRCJBQ0MDLpcLx3H45JNPuPnmm6mpqaGhoYHe3l6KioqI\nRCK43W4A2traaG9vx7IswuEw5eXlAOzbt4/m5maGhoaoqKggHA6f086JiMgYnLOQyWScH/7wh05v\nb6/zwgsvONFo1HEcx2lra3NefPFFx3Ec58CBA84999zjDA8PO5988onz4x//2BkZGXEcx3E2btzo\nfPTRR47jOM7DDz/svPPOO2fz8TIJ3n333VyXIPK5dHxOjbOa6tmzZw8XXXQRBQUFdHV1sXLlSgBW\nrVpFZ2cnAF1dXVRXV2NZFkVFRRQXFxOPx0mlUgwODhIIBACoqanJbiNTJxaL5boEkc+l43NqnFXw\nv/3221x11VUApNNpvF4vAF6vl3Q6DYBt2xQUFGS38fl82LaNbdv4/f5su9/vx7btL90BERE5OxMO\n/uHhYbq6uli+fPmYX3e5XJNWlIiInDvjntw9pbu7m4ULF3LhhRcCJ0f5qVQq+5qfnw+cHOH39fVl\nt0smk/h8Pnw+H8lkclT7WGKx2Bn/5QuFQmfXK/lc2pcynen4nHytra3Z98FgkGAwOPHg3759O1de\neWV2edmyZXR0dFBbW0tHRweVlZUAVFZWsmXLFtatW4dt2xw+fJhAIIDL5cLtdhOPx1m0aBHbtm1j\n7dq1Y37WqeJOl0gkzqqzMjaPx0N/f3+uyxAZk47PyVVSUjLmD9MJBf/x48fZs2cPd955Z7attraW\n+vp62tvbKSwsJBKJAFBaWsqKFSuIRCLk5eWxfv367DRQXV0dTU1N2cs5ly5dOhl9ExGRs+BynPPj\nscwa8U8OjahkOtPxOblKSkrGbNeduyIihlHwi4gYRsEvImIYBb+IiGEU/CIihlHwi4gYRsEvImIY\nBb+IiGEU/CIihlHwi4gYRsEvImIYBb+IiGEU/CIihlHwi4gYRsEvImIYBb+IiGEU/CIihlHwi4gY\nRsEvImKYCf2ydRE5vx06NJdEwsp1GeOyLMhk3LkuY1wlJRkWLDie6zK+MAW/iAESCYvaWm+uy5gx\notEUCxbkuoovTlM9IiKGUfCLiBhmQlM9AwMD/OY3v+HAgQO4XC7uuusuiouLaWhooLe3l6KiIiKR\nCG73ybm5trY22tvbsSyLcDhMeXk5APv27aO5uZmhoSEqKioIh8PnrGMiIjK2CY34n332WSoqKqiv\nr2fz5s0sWLCAaDTK4sWLaWxsJBgM0tbWBsDBgwfZuXMn9fX1bNy4kZaWFhzHAaClpYUNGzbQ2NhI\nT08P3d3d565nIiIypnGDf2BggL1797J69WoALMvC7XbT1dXFypUrAVi1ahWdnZ0AdHV1UV1djWVZ\nFBUVUVxcTDweJ5VKMTg4SCAQAKCmpia7jYiITJ1xp3qOHDmCx+OhubmZ/fv3s3DhQsLhMOl0Gq/3\n5FUCXq+XdDoNgG3blJWVZbf3+XzYto1lWfj9/my73+/Htu3J7o+IiIxj3BH/yMgI//znP7n22mt5\n9NFHmTt3LtFodNR6LpfrnBQoIiKTa9wRv8/nw+/3s2jRIgCWL19ONBrF6/WSSqWyr/n5+dn1+/r6\nstsnk0l8Ph8+n49kMjmqfSyxWIxYLJZdDoVCeDyeL9ZDOcOcOXO0Lw1kTf97t84rlmWdN99Hra2t\n2ffBYJBgMDh+8Hu9Xvx+P4lEgpKSEvbs2UNpaSmlpaV0dHRQW1tLR0cHlZWVAFRWVrJlyxbWrVuH\nbdscPnyYQCCAy+XC7XYTj8dZtGgR27ZtY+3atWN+5qniTtff3/9l+i7/5vF4tC8NdD7cDXs+yWQy\n9PcP5LqMcXk8HkKh0Kj2CV3Oefvtt/PUU08xPDzMRRddxN13383IyAj19fW0t7dTWFhIJBIBoLS0\nlBUrVhCJRMjLy2P9+vXZaaC6ujqampqyl3MuXbp0ErsoIiIT4XJOXWs5zSUSiVyXMCNoxG+mzk63\nHtkwiaLRFFVV03/EX1JSMma77twVETGMgl9ExDAKfhERwyj4RUQMo+AXETGMgl9ExDAKfhERwyj4\nRUQMo+AXETGMgl9ExDAKfhERwyj4RUQMo+AXETGMgl9ExDAKfhERwyj4RUQMo+AXETGMgl9ExDAK\nfhERwyj4RUQMo+AXETGMgl9ExDAKfhERw+RNZKUf/ehHuN1uXC4XlmXxyCOPcOzYMRoaGujt7aWo\nqIhIJILb7Qagra2N9vZ2LMsiHA5TXl4OwL59+2hubmZoaIiKigrC4fA565iIiIxtQsHvcrnYtGkT\nX/3qV7Nt0WiUxYsXc8MNNxCNRmlra+O73/0uBw8eZOfOndTX15NMJnnggQfYsmULLpeLlpYWNmzY\nQCAQ4JFHHqG7u5ulS5ees86JiMhoE5rqcRwHx3HOaOvq6mLlypUArFq1is7Ozmx7dXU1lmVRVFRE\ncXEx8XicVCrF4OAggUAAgJqamuw2IiIydSY84n/wwQeZNWsW//d//8eaNWtIp9N4vV4AvF4v6XQa\nANu2KSsry27r8/mwbRvLsvD7/dl2v9+PbduT2RcREZmACQX/Aw88wPz58zl69CgPPvggJSUlo9Zx\nuVyTXpyIiEy+CQX//PnzAbjwwgupqqoiHo/j9XpJpVLZ1/z8fODkCL+vry+7bTKZxOfz4fP5SCaT\no9rHEovFiMVi2eVQKITH4zn73skoc+bM0b40kGXluoKZxbKs8+b7qLW1Nfs+GAwSDAbHD/7jx4/j\nOA7z5s3js88+Y/fu3dx4440sW7aMjo4Oamtr6ejooLKyEoDKykq2bNnCunXrsG2bw4cPEwgEcLlc\nuN1u4vE4ixYtYtu2baxdu3bMzzxV3On6+/u/TN/l3zwej/algTIZd65LmFEymQz9/QO5LmNcHo+H\nUCg0qn3c4E+n02zevBmXy0Umk+Hqq6+mvLycRYsWUV9fT3t7O4WFhUQiEQBKS0tZsWIFkUiEvLw8\n1q9fn50Gqquro6mpKXs5p67oERGZei7nvy/XmaYSiUSuS5gRNOI3U2enm9pab67LmDGi0RRVVdN/\nxD/W+VjQnbsiIsZR8IuIGEbBLyJiGAW/iIhhFPwiIoZR8IuIGEbBLyJiGAW/iIhhFPwiIoZR8IuI\nGEbBLyJiGAW/iIhhFPwiIoZR8IuIGEbBLyJiGAW/iIhhFPwiIoZR8IuIGEbBLyJiGAW/iIhhFPwi\nIoZR8IuIGEbBLyJimLyJrjgyMsLGjRvx+Xzcd999HDt2jIaGBnp7eykqKiISieB2uwFoa2ujvb0d\ny7IIh8OUl5cDsG/fPpqbmxkaGqKiooJwOHxOOiUiIp9vwiP+V155hQULFmSXo9EoixcvprGxkWAw\nSFtbGwAHDx5k586d1NfXs3HjRlpaWnAcB4CWlhY2bNhAY2MjPT09dHd3T3J3RERkPBMK/mQyyTvv\nvMOaNWuybV1dXaxcuRKAVatW0dnZmW2vrq7GsiyKioooLi4mHo+TSqUYHBwkEAgAUFNTk91GRESm\nzoSC//nnn+f73/8+Lpcr25ZOp/F6vQB4vV7S6TQAtm1TUFCQXc/n82HbNrZt4/f7s+1+vx/btiel\nEyIiMnHjBv+uXbvIz8/n0ksvzU7ZjOX0HwoiIjJ9jXtyd+/evXR1dfHOO+9w4sQJBgcHeeqpp/B6\nvaRSqexrfn4+cHKE39fXl90+mUzi8/nw+Xwkk8lR7WOJxWLEYrHscigUwuPxfOFOyn/MmTNH+9JA\nlpXrCmYWy7LOm++j1tbW7PtgMEgwGBw/+G+99VZuvfVWAN577z3++te/8pOf/IQXX3yRjo4Oamtr\n6ejooLKyEoDKykq2bNnCunXrsG2bw4cPEwgEcLlcuN1u4vE4ixYtYtu2baxdu3bMzzxV3On6+/u/\ncMflPzwej/algTIZd65LmFEymQz9/QO5LmNcHo+HUCg0qn3Cl3P+t9raWurr62lvb6ewsJBIJAJA\naWkpK1asIBKJkJeXx/r167PTQHV1dTQ1NWUv51y6dOkX/XgREfmCXM7/mrifRhKJRK5LmBE04jdT\nZ6eb2lpvrsuYMaLRFFVV03/EX1JSMma77twVETGMgl9ExDAKfhERwyj4RUQMo+AXETGMgl9ExDAK\nfhERwyj4RUQMo+AXETGMgl9ExDAKfhERwyj4RUQMo+AXETGMgl9ExDAKfhERwyj4RUQMo+AXETGM\ngl9ExDAKfhERwyj4RUQMo+AXETGMgl9ExDAKfhERw+SNt8LQ0BCbNm1ieHiYTCbD8uXLuemmmzh2\n7BgNDQ309vZSVFREJBLB7XYD0NbWRnt7O5ZlEQ6HKS8vB2Dfvn00NzczNDRERUUF4XD4nHZORERG\nG3fEP3v2bDZt2sRjjz3G5s2b6e7uJh6PE41GWbx4MY2NjQSDQdra2gA4ePAgO3fupL6+no0bN9LS\n0oLjOAC0tLSwYcMGGhsb6enpobu7+9z2TkRERpnQVM/cuXOBk6P/TCYDQFdXFytXrgRg1apVdHZ2\nZturq6uxLIuioiKKi4uJx+OkUikGBwcJBAIA1NTUZLcREZGpM+5UD8DIyAg///nP+eSTT7j22msJ\nBAKk02m8Xi8AXq+XdDoNgG3blJWVZbf1+XzYto1lWfj9/my73+/Htu3J7IuIiEzAhIJ/1qxZPPbY\nYwwMDPD4449z4MCBUeu4XK5JL05ERCbfhIL/FLfbzeWXX053dzder5dUKpV9zc/PB06O8Pv6+rLb\nJJNJfD4fPp+PZDI5qn0ssViMWCyWXQ6FQng8nrPqmIxtzpw52pcGsqxcVzCzWJZ13nwftba2Zt8H\ng0GCweD4wX/06FHy8vJwu92cOHGCPXv2cMMNN7Bs2TI6Ojqora2lo6ODyspKACorK9myZQvr1q3D\ntm0OHz5MIBDA5XLhdruJx+MsWrSIbdu2sXbt2jE/81Rxp+vv7/8yfZd/83g82pcGymTcuS5hRslk\nMvT3D+S6jHF5PB5CodCo9nGDP5VK0dTUxMjICI7jUF1dzRVXXEFZWRn19fW0t7dTWFhIJBIBoLS0\nlBUrVhCJRMjLy2P9+vXZaaC6ujqampqyl3MuXbp0krspIiLjcTmnrrWc5hKJRK5LmBE04jdTZ6eb\n2lpvrsuYMaLRFFVV03/EX1JSMma77twVETGMgl9ExDAKfhERwyj4RUQMo+AXETGMgl9ExDAKfhER\nwyj4RUQMo+AXETGMgl9ExDAKfhERwyj4RUQMo+AXETGMgl9ExDAKfhERwyj4RUQMo+AXETGMgl9E\nxDAKfhERwyj4RUQMo+AXETGMgl9ExDAKfhERw+SNt0IymWTr1q2k02lcLhdr1qzhuuuu49ixYzQ0\nNNDb20tRURGRSAS32w1AW1sb7e3tWJZFOBymvLwcgH379tHc3MzQ0BAVFRWEw+Fz2jkRERlt3BG/\nZVncdtttPPnkkzz00EO8/vrrHDp0iGg0yuLFi2lsbCQYDNLW1gbAwYMH2blzJ/X19WzcuJGWlhYc\nxwGgpaWFDRs20NjYSE9PD93d3ee2dyIiMsq4we/1ern00ksBmDdvHgsWLCCZTNLV1cXKlSsBWLVq\nFZ2dnQB0dXVRXV2NZVkUFRVRXFxMPB4nlUoxODhIIBAAoKamJruNiIhMnbOa4z9y5Aj79++nrKyM\ndDqN1+sFTv5wSKfTANi2TUFBQXYbn8+HbdvYto3f78+2+/1+bNuejD6IiMhZGHeO/5TPPvuMJ598\nknA4zLx580Z93eVyTVpRsViMWCyWXQ6FQng8nkn7+002Z84c7UsDWVauK5hZLMs6b76PWltbs++D\nwSDBYHBiwZ/JZHjiiSeoqamhqqoKODnKT6VS2df8/Hzg5Ai/r68vu20ymcTn8+Hz+Ugmk6Pax3Kq\nuNP19/dPsJvyv3g8Hu1LA2Uy7lyXMKNkMhn6+wdyXca4PB4PoVBoVPuEpnqefvppSktLue6667Jt\ny5Yto6OjA4COjg4qKysBqKys5O2332Z4eJgjR45w+PBhAoEAXq8Xt9tNPB7HcRy2bduW/SEiIiJT\nZ9wR/969e3nrrbe45JJLuPfee3G5XNxyyy3U1tZSX19Pe3s7hYWFRCIRAEpLS1mxYgWRSIS8vDzW\nr1+fnQaqq6ujqakpeznn0qVLz23vRERkFJdz6lrLaS6RSOS6hBlBUz1m6ux0U1vrzXUZM0Y0mqKq\navpP9ZSUlIzZrjt3RUQMo+AXETGMgl9ExDAKfhERwyj4RUQMo+AXETGMgl9ExDAKfhERwyj4RUQM\no+AXETGMgl9ExDAKfhERwyj4RUQMo+AXETGMgl9ExDAKfhERwyj4RUQMo+AXETGMgl9ExDAKfhER\nwyj4RUQMo+AXETGMgl9ExDB5463w9NNPs2vXLvLz83n88ccBOHbsGA0NDfT29lJUVEQkEsHtdgPQ\n1tZGe3s7lmURDocpLy8HYN++fTQ3NzM0NERFRQXhcPjc9UpERD7XuCP+1atXc//995/RFo1GWbx4\nMY2NjQSDQdra2gA4ePAgO3fupL6+no0bN9LS0oLjOAC0tLSwYcMGGhsb6enpobu7+xx0R0RExjNu\n8F922WVccMEFZ7R1dXWxcuVKAFatWkVnZ2e2vbq6GsuyKCoqori4mHg8TiqVYnBwkEAgAEBNTU12\nGxERmVpfaI4/nU7j9XoB8Hq9pNNpAGzbpqCgILuez+fDtm1s28bv92fb/X4/tm1/mbpFROQLGneO\nfyJcLtdk/DVZsViMWCyWXQ6FQng8nkn9DFPNmTNH+9JAlpXrCmYWy7LOm++j1tbW7PtgMEgwGPxi\nwe/1ekmlUtnX/Px84OQIv6+vL7teMpnE5/Ph8/lIJpOj2j/PqeJO19/f/0VKlf/i8Xi0Lw2Uybhz\nXcKMkslk6O8fyHUZ4/J4PIRCoVHtE5rqcRwne5IWYNmyZXR0dADQ0dFBZWUlAJWVlbz99tsMDw9z\n5MgRDh8+TCAQwOv14na7icfjOI7Dtm3bqKqqmoRuiYjI2Rp3xN/Y2Mh7771Hf38/d911F6FQiNra\nWurr62lvb6ewsJBIJAJAaWkpK1asIBKJkJeXx/r167PTQHV1dTQ1NWUv51y6dOm57ZmIiIzJ5Zw+\nlJ/GEolErkuYETTVY6bOTje1td5clzFjRKMpqqqm/1RPSUnJmO26c1dExDAKfhERwyj4RUQMo+AX\nETGMgl9ExDAKfhERwyj4RUQMo+AXETGMgl9ExDAKfhERwyj4RUQMMynP45eTDh2aSyIxvR98blnn\nxyN6S0oyLFhwPNdliMxICv5JlEhYehDWJIlGUyxYkOsqRGYmTfWIiBhGwS8iYhgFv4iIYRT8IiKG\nUfCLiBhGwS8iYhgFv4iIYRT8IiKGUfCLiBhmyu/c7e7u5rnnnsNxHFavXk1tbe1UlyAiYrQpHfGP\njIzw29/+lvvvv58nnniCHTt2cOjQoaksQUTEeFMa/PF4nOLiYgoLC8nLy+PKK6+ks7NzKksQETHe\nlAa/bdv4/f7sss/nw7btqSxBRMR4OrkrImKYKT256/P56Ovryy7bto3P5xu1XiwWIxaLZZdDoRAl\nJSVTUuOXccMN4Di5rmKm8P77j0wGHZuT7fw5PltbW7Pvg8EgwWBwaoM/EAhw+PBhent7mT9/Pjt2\n7OCnP/3pqPVOFSeTr7W1lVAolOsyRMak43PyjbU/pzT4Z82aRV1dHQ8++CCO43DNNddQWlo6lSWI\niBhvyq/jX7p0KY2NjVP9sSIi8m86uWsYTaHJdKbjc2q4HEenfERETKIRv4iIYRT8IiKGUfCLiBhG\nwS8iYhgFv0GOHz+e6xJExnTixAkSiUSuyzCGgt8AH3zwAZFIhJ/97GcAfPzxx7S0tOS4KpGTurq6\nuOeee3jooYeAk8fno48+muOqZjYFvwGef/557r//fjweDwCXXnop77//fo6rEjnpz3/+M4888ggX\nXHABcPL4PHLkSI6rmtkU/IYoKCg4Y3nWLP3Ty/SQl5eH2+0+o83lcuWoGjNM+SMbZOr5/X4++OAD\nXC4Xw8PDvPLKKyxYsCDXZYkAUFpayvbt2xkZGaGnp4dXX32VsrKyXJc1o+nOXQMcPXqU5557jj17\n9uA4DkuWLOH222/PTv2I5NLx48d56aWX2L17N47jUF5ezne+8x3mzJmT69JmLAW/iIhhNNUzg/3u\nd7/7n1+/4447pqgSkdF+/etf/8+5/Pvuu28KqzGLgn8GW7hwYa5LEPlc119/fa5LMJamekREDKMR\nvwGOHj1KNBrl0KFDnDhxItu+adOmHFYlclJPTw9/+MMfOHjwIENDQ9n2rVu35rCqmU0Xcxtgy5Yt\nlJaWcuTIEW666SYKCwtZtGhRrssSAaC5uZlvfetbWJbFpk2bqKmp4eqrr851WTOagt8A/f39XHPN\nNViWxeWXX87dd99NLBbLdVkiwMnn9CxevBjHcSgsLCQUCrFr165clzWjaarHAHl5J/+Z58+fz65d\nu5g/fz7Hjh3LcVUiJ82ePZuRkRGKi4t57bXX8Pl8fPbZZ7kua0bTyV0D/P3vf+frX/86fX19PPvs\nswwMDHDTTTdRWVmZ69JEiMfjlJaW8umnn/KnP/2JgYEBrr/+et29ew4p+EVEDKOpHgMcOXKEV199\nld7eXjKZTLZdN8hILo336GUdn+eOgt8AmzdvZvXq1SxbtkxP5ZRp48MPP6SgoIArr7ySQCCQ63KM\nouA3wOzZs7nuuutyXYbIGZ555hl2797N9u3b2b59O1dccQVXXnklF198ca5Lm/E0x2+A7du309PT\nQ3l5efYKH9AjHWT6GBoaYseOHbzwwgvcdNNNfPvb3851STOaRvwG+Ne//sW2bdt49913z5jq0Z27\nkmtDQ0Ps2rWLHTt20Nvby9q1a/nGN76R67JmPAW/AXbu3MnWrVvPGO2L5NrWrVs5cOAAFRUV3Hjj\njVxyySW5LskYSgIDXHzxxXz66afk5+fnuhSRrLfeeou5c+dmf+vWKY7j4HK5eP7553NY3cymOX4D\n/OpXv2L//v0EAoEzRv26XE7ETAp+A7z33ntjtl9++eVTXImITAcKfkP09vbS09PDkiVLOH78OCMj\nI3zlK1/JdVkikgO6m8cAb7zxBk8++STPPPMMALZts3nz5hxXJSK5ouA3wOuvv84DDzyQHeEXFxeT\nTqdzXJWI5IqC3wCzZ88+46RuJpP5n7/kWkRmNl3OaYDLL7+cl156iRMnTrB7925ef/11li1bluuy\nRCRHdHLXACMjI7z55pvs3r0bx3EoLy9nzZo1GvWLGErBP4P19fVRUFCQ6zJEZJrRHP8MdvqVO48/\n/ngOKxGR6UTBP4Od/p+5I0eO5LASEZlOFPwz2Olz+JrPF5FTNMc/g918883MmzcPx3E4ceIEc+fO\nBfQQLBHTKfhFRAyjqR4REcMo+EVEDKPgFxExjIJfRMQwelaPyOfYu3cvv//97zlw4ACWZbFgwQLC\n4TALFy7MdWkiX4qCX2QMg4ODPProo/zgBz9gxYoVDA8P8/777+sX1suMoKkekTH09PQAUF1djcvl\nYvbs2SxZsoRLLrkEgDfffJNIJMIdd9zBww8/TF9fHwAffvghdXV12LYNwMcff8ztt99OIpHITUdE\nxqDgFxlDcXExs2bNoqmpie7ubj799NPs1zo7O/nLX/7CPffcQ0tLC5dddhmNjY0AlJWV8c1vfpOm\npiZOnDjB1q1bueWWWygpKclVV0RG0Q1cIp8jkUgQjUbZs2cPqVSKiooK7rzzTpqbm1m+fDmrV68G\nTj72+rbbbqO+vp6CggIymQz3338/w8PD+P1+Nm7cmOOeiJxJwS8yAYlEgqeeeoqvfe1r7N+/n2Qy\nyaxZ//kP8/DwML/85S8pKysD4LXXXuPZZ5/lF7/4BYsXL85V2SJjUvCLTNBrr73GG2+8gc/no6am\nhquuumrM9Wzb5p577qGqqop//OMfPPLIIzopLNOK5vhFxpBIJHj55ZezJ2n7+vrYsWNHdg6/ra2N\ngwcPAjAwMMDf/va37LbNzc2sWbOGDRs2MH/+fP74xz/mpA8in0fDEJExzJs3j48++oiXX36ZgYEB\nLrjgApYtW8b3vvc95s2bx2effUZDQwN9fX243W6WLFnC8uXLeeWVVzh69Cg333wzAHfddRf33nsv\nlZWVXHbZZTnulchJmuoRETGMpnpERAyj4BcRMYyCX0TEMAp+ERHDKPhFRAyj4BcRMYyCX0TEMAp+\nERHDKPhFRAzz/8QaOTeizRiVAAAAAElFTkSuQmCC\n",
"text/plain": [
"<matplotlib.figure.Figure at 0x109aec690>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"def word_count(dialogue):\n",
" return len(dialogue.split())\n",
"\n",
"toy_story_df['Word Count'] = toy_story_df['Dialogue'].apply(word_count)\n",
"\n",
"word_df = toy_story_df.groupby('Sex')['Word Count'].sum()\n",
"word_df.plot(kind='bar')\n",
"word_df"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.11"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment