gdsaxton/Producing Summary Statistics for Publication.ipynb

## Producing Summary Statistics for Publication.ipynb
{
 "metadata": {
  "name": ""
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
           "<br>\n",
      "\n",
      "In this notebook I will show you how to run descriptive statistics for your dataset and save the output. The desired end product is a CSV table of key summary statistics -- count, mean, std. dev., min. and max -- for the variables in your dataset."
     ]
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "Import packages"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "import numpy as np\n",
      "import pandas as pd\n",
      "from pandas import DataFrame\n",
      "from pandas import Series\n",
      "import statsmodels #FOR NEXT STEP -- RUNNING REGRESSIONS \n",
      "import statsmodels.api as sm\n",
      "import statsmodels.formula.api as smf   #FOR USING 'R'-STYLE FORMULAS FOR REGRESSIONS"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 1
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "<br>\n",
      "\n",
      "PANDAS allows you to set various options for, among other things, inspecting the data. I like to be able to see all of the columns. Therefore, I typically include this line at the top of all my notebooks."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "#Set PANDAS to show all columns in DataFrame\n",
      "pd.set_option('display.max_columns', None)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 2
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "<br>I like suppressing scientific notation in my numbers. So, if you'd rather see \"0.48\" than \"4.800000e-01\", then run the following line. Note that this does not change the actual values. For outputting to CSV we'll have to run some additional code later on."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "pd.set_option('display.float_format', lambda x: '%.2f' % x)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 3
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "<br>\n",
      "\n",
      "I'm running PANDAS 0.13 here."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "print pd.__version__\n",
      "print statsmodels.__version__"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "0.13.1\n",
        "0.6.1\n"
       ]
      }
     ],
     "prompt_number": 4
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "Read in dataframe"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "I'm using data on a sample of 1,500 Facebook statuses of a sample of US-based health organizations"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "df = pd.read_excel('fb.xls', 'Sheet1', header=0)\n",
      "print len(df)\n",
      "df.head(2)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "1500\n"
       ]
      },
      {
       "html": [
        "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
        "<table border=\"1\" class=\"dataframe\">\n",
        "  <thead>\n",
        "    <tr style=\"text-align: right;\">\n",
        "      <th></th>\n",
        "      <th>No.</th>\n",
        "      <th>id</th>\n",
        "      <th>feed_id</th>\n",
        "      <th>org_name</th>\n",
        "      <th>FB_org_id</th>\n",
        "      <th>location</th>\n",
        "      <th>link</th>\n",
        "      <th>message_id</th>\n",
        "      <th>org_id</th>\n",
        "      <th>status_id</th>\n",
        "      <th>status_link</th>\n",
        "      <th>content</th>\n",
        "      <th>published_date</th>\n",
        "      <th>date_inserted</th>\n",
        "      <th>last_comment</th>\n",
        "      <th>type</th>\n",
        "      <th>status_type</th>\n",
        "      <th>video_source</th>\n",
        "      <th>picture_link</th>\n",
        "      <th>link_name</th>\n",
        "      <th>link_caption</th>\n",
        "      <th>link_description</th>\n",
        "      <th>num_mentions</th>\n",
        "      <th>mentions</th>\n",
        "      <th>num_likes</th>\n",
        "      <th>like_count</th>\n",
        "      <th>comment_count</th>\n",
        "      <th>share_count</th>\n",
        "      <th>hashtag_count</th>\n",
        "      <th>hashtags</th>\n",
        "      <th>mentions_count</th>\n",
        "      <th>actions</th>\n",
        "      <th>application</th>\n",
        "      <th>properties</th>\n",
        "      <th>message_output</th>\n",
        "      <th>time_since_post_days</th>\n",
        "      <th>like_count_7days</th>\n",
        "      <th>comment_count_7days</th>\n",
        "      <th>share_count_7days</th>\n",
        "      <th>time_since_post_14days</th>\n",
        "      <th>like_count_14days</th>\n",
        "      <th>comment_count_14days</th>\n",
        "      <th>share_count_14days</th>\n",
        "      <th>feed_organization</th>\n",
        "      <th>images</th>\n",
        "      <th>urls_count</th>\n",
        "      <th>urls_count_true</th>\n",
        "      <th>mentions_scrape</th>\n",
        "      <th>source</th>\n",
        "      <th>mission/non-mission</th>\n",
        "      <th>mission_focus</th>\n",
        "      <th>I-C-A</th>\n",
        "      <th>subcode</th>\n",
        "      <th>source_External</th>\n",
        "      <th>source_Internal</th>\n",
        "      <th>mission_N</th>\n",
        "      <th>mission_Y</th>\n",
        "      <th>mission_focus_Building capacity</th>\n",
        "      <th>mission_focus_Patient advocacy</th>\n",
        "      <th>mission_focus_Prevention</th>\n",
        "      <th>ICA_A</th>\n",
        "      <th>ICA_C</th>\n",
        "      <th>ICA_I</th>\n",
        "      <th>ICA_subcode_AIDS-related day</th>\n",
        "      <th>ICA_subcode_Awareness</th>\n",
        "      <th>ICA_subcode_Complementary support</th>\n",
        "      <th>ICA_subcode_Cover/profile photo</th>\n",
        "      <th>ICA_subcode_Dialogue</th>\n",
        "      <th>ICA_subcode_Donation</th>\n",
        "      <th>ICA_subcode_Event info</th>\n",
        "      <th>ICA_subcode_Event promotion</th>\n",
        "      <th>ICA_subcode_Event update</th>\n",
        "      <th>ICA_subcode_Get tested</th>\n",
        "      <th>ICA_subcode_HIV/AIDS info/news</th>\n",
        "      <th>ICA_subcode_Idea promotion</th>\n",
        "      <th>ICA_subcode_LGBT/Transgender information</th>\n",
        "      <th>ICA_subcode_Lobbying</th>\n",
        "      <th>ICA_subcode_Media action</th>\n",
        "      <th>ICA_subcode_Medication</th>\n",
        "      <th>ICA_subcode_National holiday/Holiday</th>\n",
        "      <th>ICA_subcode_Organizational news/announcement</th>\n",
        "      <th>ICA_subcode_Other</th>\n",
        "      <th>ICA_subcode_Patient stories</th>\n",
        "      <th>ICA_subcode_Recognition</th>\n",
        "      <th>ICA_subcode_Research/survey</th>\n",
        "      <th>ICA_subcode_Viewing action</th>\n",
        "      <th>ICA_subcode_Volunteer</th>\n",
        "      <th>type_event</th>\n",
        "      <th>type_link</th>\n",
        "      <th>type_music</th>\n",
        "      <th>type_photo</th>\n",
        "      <th>type_status</th>\n",
        "      <th>type_video</th>\n",
        "      <th>status_type_added_photos</th>\n",
        "      <th>status_type_added_video</th>\n",
        "      <th>status_type_created_event</th>\n",
        "      <th>status_type_created_note</th>\n",
        "      <th>status_type_mobile_status_update</th>\n",
        "      <th>status_type_published_story</th>\n",
        "      <th>status_type_shared_story</th>\n",
        "      <th>video_dummy</th>\n",
        "      <th>picture_dummy</th>\n",
        "      <th>Org_ID</th>\n",
        "      <th>Org_Name</th>\n",
        "      <th>Total_Revenue</th>\n",
        "      <th>Assets</th>\n",
        "      <th>followers_count</th>\n",
        "      <th>talking_about_count</th>\n",
        "      <th>were_here_count</th>\n",
        "      <th>Log_of_Assets</th>\n",
        "      <th>Log_of_Total_Revenue</th>\n",
        "      <th>Log_of_Followers</th>\n",
        "      <th>likes_binary</th>\n",
        "      <th>comment_binary</th>\n",
        "      <th>share_binary</th>\n",
        "    </tr>\n",
        "  </thead>\n",
        "  <tbody>\n",
        "    <tr>\n",
        "      <th>0</th>\n",
        "      <td> 1118</td>\n",
        "      <td> 74</td>\n",
        "      <td> 1</td>\n",
        "      <td> AIDS Healthcare Foundation</td>\n",
        "      <td> 34661411150</td>\n",
        "      <td> {'location': {'city': 'Los Angeles', 'zip': '9...</td>\n",
        "      <td> https://www.facebook.com/AIDShealth/photos/a.1...</td>\n",
        "      <td> 34661411150_10152940344646151</td>\n",
        "      <td> 34661411150</td>\n",
        "      <td> 10152940344646100</td>\n",
        "      <td> https://www.facebook.com/34661411150/posts/101...</td>\n",
        "      <td> 'TIS THE SEASON: We know you've heard us say, ...</td>\n",
        "      <td> 2014-12-18T05:19:30+0000</td>\n",
        "      <td>2015-03-09 00:00:39</td>\n",
        "      <td> 2014-12-20T13:33:45+0000</td>\n",
        "      <td> photo</td>\n",
        "      <td> added_photos</td>\n",
        "      <td> NaN</td>\n",
        "      <td> https://scontent.xx.fbcdn.net/hphotos-xpf1/v/t...</td>\n",
        "      <td>                                               NaN</td>\n",
        "      <td>                NaN</td>\n",
        "      <td>                                               NaN</td>\n",
        "      <td>1.00</td>\n",
        "      <td> Art Hearts Fashion</td>\n",
        "      <td>nan</td>\n",
        "      <td>nan</td>\n",
        "      <td>nan</td>\n",
        "      <td>nan</td>\n",
        "      <td> 1</td>\n",
        "      <td> UseACondom</td>\n",
        "      <td>nan</td>\n",
        "      <td> [{'link': 'https://www.facebook.com/3466141115...</td>\n",
        "      <td> NaN</td>\n",
        "      <td> NaN</td>\n",
        "      <td> {'picture': 'https://scontent.xx.fbcdn.net/hph...</td>\n",
        "      <td>nan</td>\n",
        "      <td>nan</td>\n",
        "      <td>nan</td>\n",
        "      <td>nan</td>\n",
        "      <td>81.00</td>\n",
        "      <td> 101</td>\n",
        "      <td> 3</td>\n",
        "      <td> 6</td>\n",
        "      <td> AIDS HEALTHCARE FOUNDATION</td>\n",
        "      <td>nan</td>\n",
        "      <td>nan</td>\n",
        "      <td>nan</td>\n",
        "      <td>nan</td>\n",
        "      <td> Internal</td>\n",
        "      <td> Y</td>\n",
        "      <td> Building capacity</td>\n",
        "      <td> A</td>\n",
        "      <td> Event promotion</td>\n",
        "      <td> 0</td>\n",
        "      <td> 1</td>\n",
        "      <td> 0</td>\n",
        "      <td> 1</td>\n",
        "      <td> 1</td>\n",
        "      <td> 0</td>\n",
        "      <td> 0</td>\n",
        "      <td> 1</td>\n",
        "      <td> 0</td>\n",
        "      <td> 0</td>\n",
        "      <td> 0</td>\n",
        "      <td> 0</td>\n",
        "      <td> 0</td>\n",
        "      <td> 0</td>\n",
        "      <td> 0</td>\n",
        "      <td> 0</td>\n",
        "      <td> 0</td>\n",
        "      <td> 1</td>\n",
        "      <td> 0</td>\n",
        "      <td> 0</td>\n",
        "      <td> 0</td>\n",
        "      <td> 0</td>\n",
        "      <td> 0</td>\n",
        "      <td> 0</td>\n",
        "      <td> 0</td>\n",
        "      <td> 0</td>\n",
        "      <td> 0</td>\n",
        "      <td> 0</td>\n",
        "      <td> 0</td>\n",
        "      <td> 0</td>\n",
        "      <td> 0</td>\n",
        "      <td> 0</td>\n",
        "      <td> 0</td>\n",
        "      <td> 0</td>\n",
        "      <td> 0</td>\n",
        "      <td> 0</td>\n",
        "      <td> 0</td>\n",
        "      <td> 1</td>\n",
        "      <td> 0</td>\n",
        "      <td> 0</td>\n",
        "      <td> 1</td>\n",
        "      <td> 0</td>\n",
        "      <td> 0</td>\n",
        "      <td> 0</td>\n",
        "      <td> 0</td>\n",
        "      <td> 0</td>\n",
        "      <td> 0</td>\n",
        "      <td> 0</td>\n",
        "      <td> 1</td>\n",
        "      <td> 1</td>\n",
        "      <td> AIDS HEALTHCARE FOUNDATION</td>\n",
        "      <td> 209986539</td>\n",
        "      <td> 227735541</td>\n",
        "      <td> 721419</td>\n",
        "      <td> 896</td>\n",
        "      <td> 3891</td>\n",
        "      <td>19.24</td>\n",
        "      <td>19.16</td>\n",
        "      <td>13.49</td>\n",
        "      <td> 1</td>\n",
        "      <td> 1</td>\n",
        "      <td> 1</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>1</th>\n",
        "      <td> 1262</td>\n",
        "      <td> 79</td>\n",
        "      <td> 1</td>\n",
        "      <td> AIDS Healthcare Foundation</td>\n",
        "      <td> 34661411150</td>\n",
        "      <td>                                               NaN</td>\n",
        "      <td> http://www.frontiersmedia.com/frontiers-blog/2...</td>\n",
        "      <td> 34661411150_10152926215086151</td>\n",
        "      <td> 34661411150</td>\n",
        "      <td> 10152926215086100</td>\n",
        "      <td> https://www.facebook.com/34661411150/posts/101...</td>\n",
        "      <td> AHF mourns the tragic loss of our friend and c...</td>\n",
        "      <td> 2014-12-11T18:02:10+0000</td>\n",
        "      <td>2015-03-09 00:00:40</td>\n",
        "      <td> 2014-12-14T16:41:29+0000</td>\n",
        "      <td>  link</td>\n",
        "      <td> shared_story</td>\n",
        "      <td> NaN</td>\n",
        "      <td> https://fbexternal-a.akamaihd.net/safe_image.p...</td>\n",
        "      <td> Dana Miller\u00e2\u20ac\u201dProducer, AIDS Advocate, Frontie...</td>\n",
        "      <td> frontiersmedia.com</td>\n",
        "      <td> \u00e2\u20ac\u0153Dana was legendary in gay Los Angeles and H...</td>\n",
        "      <td>0.00</td>\n",
        "      <td>                NaN</td>\n",
        "      <td>nan</td>\n",
        "      <td>nan</td>\n",
        "      <td>nan</td>\n",
        "      <td>nan</td>\n",
        "      <td> 1</td>\n",
        "      <td> DanaMiller</td>\n",
        "      <td>nan</td>\n",
        "      <td> [{'link': 'https://www.facebook.com/3466141115...</td>\n",
        "      <td> NaN</td>\n",
        "      <td> NaN</td>\n",
        "      <td> {'picture': 'https://fbexternal-a.akamaihd.net...</td>\n",
        "      <td>nan</td>\n",
        "      <td>nan</td>\n",
        "      <td>nan</td>\n",
        "      <td>nan</td>\n",
        "      <td>88.00</td>\n",
        "      <td>  78</td>\n",
        "      <td> 9</td>\n",
        "      <td> 4</td>\n",
        "      <td> AIDS HEALTHCARE FOUNDATION</td>\n",
        "      <td>nan</td>\n",
        "      <td>nan</td>\n",
        "      <td>nan</td>\n",
        "      <td>nan</td>\n",
        "      <td> External</td>\n",
        "      <td> Y</td>\n",
        "      <td> Building capacity</td>\n",
        "      <td> C</td>\n",
        "      <td>     Recognition</td>\n",
        "      <td> 1</td>\n",
        "      <td> 0</td>\n",
        "      <td> 0</td>\n",
        "      <td> 1</td>\n",
        "      <td> 1</td>\n",
        "      <td> 0</td>\n",
        "      <td> 0</td>\n",
        "      <td> 0</td>\n",
        "      <td> 1</td>\n",
        "      <td> 0</td>\n",
        "      <td> 0</td>\n",
        "      <td> 0</td>\n",
        "      <td> 0</td>\n",
        "      <td> 0</td>\n",
        "      <td> 0</td>\n",
        "      <td> 0</td>\n",
        "      <td> 0</td>\n",
        "      <td> 0</td>\n",
        "      <td> 0</td>\n",
        "      <td> 0</td>\n",
        "      <td> 0</td>\n",
        "      <td> 0</td>\n",
        "      <td> 0</td>\n",
        "      <td> 0</td>\n",
        "      <td> 0</td>\n",
        "      <td> 0</td>\n",
        "      <td> 0</td>\n",
        "      <td> 0</td>\n",
        "      <td> 0</td>\n",
        "      <td> 0</td>\n",
        "      <td> 1</td>\n",
        "      <td> 0</td>\n",
        "      <td> 0</td>\n",
        "      <td> 0</td>\n",
        "      <td> 0</td>\n",
        "      <td> 1</td>\n",
        "      <td> 0</td>\n",
        "      <td> 0</td>\n",
        "      <td> 0</td>\n",
        "      <td> 0</td>\n",
        "      <td> 0</td>\n",
        "      <td> 0</td>\n",
        "      <td> 0</td>\n",
        "      <td> 0</td>\n",
        "      <td> 0</td>\n",
        "      <td> 0</td>\n",
        "      <td> 1</td>\n",
        "      <td> 0</td>\n",
        "      <td> 1</td>\n",
        "      <td> 1</td>\n",
        "      <td> AIDS HEALTHCARE FOUNDATION</td>\n",
        "      <td> 209986539</td>\n",
        "      <td> 227735541</td>\n",
        "      <td> 721419</td>\n",
        "      <td> 896</td>\n",
        "      <td> 3891</td>\n",
        "      <td>19.24</td>\n",
        "      <td>19.16</td>\n",
        "      <td>13.49</td>\n",
        "      <td> 1</td>\n",
        "      <td> 1</td>\n",
        "      <td> 1</td>\n",
        "    </tr>\n",
        "  </tbody>\n",
        "</table>\n",
        "<p>2 rows \u00d7 115 columns</p>\n",
        "</div>"
       ],
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 81,
       "text": [
        "    No.  id  feed_id                    org_name    FB_org_id  \\\n",
        "0  1118  74        1  AIDS Healthcare Foundation  34661411150   \n",
        "1  1262  79        1  AIDS Healthcare Foundation  34661411150   \n",
        "\n",
        "                                            location  \\\n",
        "0  {'location': {'city': 'Los Angeles', 'zip': '9...   \n",
        "1                                                NaN   \n",
        "\n",
        "                                                link  \\\n",
        "0  https://www.facebook.com/AIDShealth/photos/a.1...   \n",
        "1  http://www.frontiersmedia.com/frontiers-blog/2...   \n",
        "\n",
        "                      message_id       org_id          status_id  \\\n",
        "0  34661411150_10152940344646151  34661411150  10152940344646100   \n",
        "1  34661411150_10152926215086151  34661411150  10152926215086100   \n",
        "\n",
        "                                         status_link  \\\n",
        "0  https://www.facebook.com/34661411150/posts/101...   \n",
        "1  https://www.facebook.com/34661411150/posts/101...   \n",
        "\n",
        "                                             content  \\\n",
        "0  'TIS THE SEASON: We know you've heard us say, ...   \n",
        "1  AHF mourns the tragic loss of our friend and c...   \n",
        "\n",
        "             published_date       date_inserted              last_comment  \\\n",
        "0  2014-12-18T05:19:30+0000 2015-03-09 00:00:39  2014-12-20T13:33:45+0000   \n",
        "1  2014-12-11T18:02:10+0000 2015-03-09 00:00:40  2014-12-14T16:41:29+0000   \n",
        "\n",
        "    type   status_type video_source  \\\n",
        "0  photo  added_photos          NaN   \n",
        "1   link  shared_story          NaN   \n",
        "\n",
        "                                        picture_link  \\\n",
        "0  https://scontent.xx.fbcdn.net/hphotos-xpf1/v/t...   \n",
        "1  https://fbexternal-a.akamaihd.net/safe_image.p...   \n",
        "\n",
        "                                           link_name        link_caption  \\\n",
        "0                                                NaN                 NaN   \n",
        "1  Dana Miller\u00e2\u20ac\u201dProducer, AIDS Advocate, Frontie...  frontiersmedia.com   \n",
        "\n",
        "                                    link_description  num_mentions  \\\n",
        "0                                                NaN          1.00   \n",
        "1  \u00e2\u20ac\u0153Dana was legendary in gay Los Angeles and H...          0.00   \n",
        "\n",
        "             mentions  num_likes  like_count  comment_count  share_count  \\\n",
        "0  Art Hearts Fashion        nan         nan            nan          nan   \n",
        "1                 NaN        nan         nan            nan          nan   \n",
        "\n",
        "   hashtag_count    hashtags  mentions_count  \\\n",
        "0              1  UseACondom             nan   \n",
        "1              1  DanaMiller             nan   \n",
        "\n",
        "                                             actions application properties  \\\n",
        "0  [{'link': 'https://www.facebook.com/3466141115...         NaN        NaN   \n",
        "1  [{'link': 'https://www.facebook.com/3466141115...         NaN        NaN   \n",
        "\n",
        "                                      message_output  time_since_post_days  \\\n",
        "0  {'picture': 'https://scontent.xx.fbcdn.net/hph...                   nan   \n",
        "1  {'picture': 'https://fbexternal-a.akamaihd.net...                   nan   \n",
        "\n",
        "   like_count_7days  comment_count_7days  share_count_7days  \\\n",
        "0               nan                  nan                nan   \n",
        "1               nan                  nan                nan   \n",
        "\n",
        "   time_since_post_14days  like_count_14days  comment_count_14days  \\\n",
        "0                   81.00                101                     3   \n",
        "1                   88.00                 78                     9   \n",
        "\n",
        "   share_count_14days           feed_organization  images  urls_count  \\\n",
        "0                   6  AIDS HEALTHCARE FOUNDATION     nan         nan   \n",
        "1                   4  AIDS HEALTHCARE FOUNDATION     nan         nan   \n",
        "\n",
        "   urls_count_true  mentions_scrape    source mission/non-mission  \\\n",
        "0              nan              nan  Internal                   Y   \n",
        "1              nan              nan  External                   Y   \n",
        "\n",
        "       mission_focus I-C-A          subcode  source_External  source_Internal  \\\n",
        "0  Building capacity     A  Event promotion                0                1   \n",
        "1  Building capacity     C      Recognition                1                0   \n",
        "\n",
        "   mission_N  mission_Y  mission_focus_Building capacity  \\\n",
        "0          0          1                                1   \n",
        "1          0          1                                1   \n",
        "\n",
        "   mission_focus_Patient advocacy  mission_focus_Prevention  ICA_A  ICA_C  \\\n",
        "0                               0                         0      1      0   \n",
        "1                               0                         0      0      1   \n",
        "\n",
        "   ICA_I  ICA_subcode_AIDS-related day  ICA_subcode_Awareness  \\\n",
        "0      0                             0                      0   \n",
        "1      0                             0                      0   \n",
        "\n",
        "   ICA_subcode_Complementary support  ICA_subcode_Cover/profile photo  \\\n",
        "0                                  0                                0   \n",
        "1                                  0                                0   \n",
        "\n",
        "   ICA_subcode_Dialogue  ICA_subcode_Donation  ICA_subcode_Event info  \\\n",
        "0                     0                     0                       0   \n",
        "1                     0                     0                       0   \n",
        "\n",
        "   ICA_subcode_Event promotion  ICA_subcode_Event update  \\\n",
        "0                            1                         0   \n",
        "1                            0                         0   \n",
        "\n",
        "   ICA_subcode_Get tested  ICA_subcode_HIV/AIDS info/news  \\\n",
        "0                       0                               0   \n",
        "1                       0                               0   \n",
        "\n",
        "   ICA_subcode_Idea promotion  ICA_subcode_LGBT/Transgender information  \\\n",
        "0                           0                                         0   \n",
        "1                           0                                         0   \n",
        "\n",
        "   ICA_subcode_Lobbying  ICA_subcode_Media action  ICA_subcode_Medication  \\\n",
        "0                     0                         0                       0   \n",
        "1                     0                         0                       0   \n",
        "\n",
        "   ICA_subcode_National holiday/Holiday  \\\n",
        "0                                     0   \n",
        "1                                     0   \n",
        "\n",
        "   ICA_subcode_Organizational news/announcement  ICA_subcode_Other  \\\n",
        "0                                             0                  0   \n",
        "1                                             0                  0   \n",
        "\n",
        "   ICA_subcode_Patient stories  ICA_subcode_Recognition  \\\n",
        "0                            0                        0   \n",
        "1                            0                        1   \n",
        "\n",
        "   ICA_subcode_Research/survey  ICA_subcode_Viewing action  \\\n",
        "0                            0                           0   \n",
        "1                            0                           0   \n",
        "\n",
        "   ICA_subcode_Volunteer  type_event  type_link  type_music  type_photo  \\\n",
        "0                      0           0          0           0           1   \n",
        "1                      0           0          1           0           0   \n",
        "\n",
        "   type_status  type_video  status_type_added_photos  status_type_added_video  \\\n",
        "0            0           0                         1                        0   \n",
        "1            0           0                         0                        0   \n",
        "\n",
        "   status_type_created_event  status_type_created_note  \\\n",
        "0                          0                         0   \n",
        "1                          0                         0   \n",
        "\n",
        "   status_type_mobile_status_update  status_type_published_story  \\\n",
        "0                                 0                            0   \n",
        "1                                 0                            0   \n",
        "\n",
        "   status_type_shared_story  video_dummy  picture_dummy  Org_ID  \\\n",
        "0                         0            0              1       1   \n",
        "1                         1            0              1       1   \n",
        "\n",
        "                     Org_Name  Total_Revenue     Assets  followers_count  \\\n",
        "0  AIDS HEALTHCARE FOUNDATION      209986539  227735541           721419   \n",
        "1  AIDS HEALTHCARE FOUNDATION      209986539  227735541           721419   \n",
        "\n",
        "   talking_about_count  were_here_count  Log_of_Assets  Log_of_Total_Revenue  \\\n",
        "0                  896             3891          19.24                 19.16   \n",
        "1                  896             3891          19.24                 19.16   \n",
        "\n",
        "   Log_of_Followers  likes_binary  comment_binary  share_binary  \n",
        "0             13.49             1               1             1  \n",
        "1             13.49             1               1             1  \n",
        "\n",
        "[2 rows x 115 columns]"
       ]
      }
     ],
     "prompt_number": 81
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "<br>\n",
      "\n",
      "List all the columns in the DataFrame"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "df.columns"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 82,
       "text": [
        "Index([u'No.', u'id', u'feed_id', u'org_name', u'FB_org_id', u'location', u'link', u'message_id', u'org_id', u'status_id', u'status_link', u'content', u'published_date', u'date_inserted', u'last_comment', u'type', u'status_type', u'video_source', u'picture_link', u'link_name', u'link_caption', u'link_description', u'num_mentions', u'mentions', u'num_likes', u'like_count', u'comment_count', u'share_count', u'hashtag_count', u'hashtags', u'mentions_count', u'actions', u'application', u'properties', u'message_output', u'time_since_post_days', u'like_count_7days', u'comment_count_7days', u'share_count_7days', u'time_since_post_14days', u'like_count_14days', u'comment_count_14days', u'share_count_14days', u'feed_organization', u'images', u'urls_count', u'urls_count_true', u'mentions_scrape', u'source', u'mission/non-mission', u'mission_focus', u'I-C-A', u'subcode', u'source_External', u'source_Internal', u'mission_N', u'mission_Y', u'mission_focus_Building capacity', u'mission_focus_Patient advocacy', u'mission_focus_Prevention', u'ICA_A', u'ICA_C', u'ICA_I', u'ICA_subcode_AIDS-related day', u'ICA_subcode_Awareness', u'ICA_subcode_Complementary support', u'ICA_subcode_Cover/profile photo', u'ICA_subcode_Dialogue', u'ICA_subcode_Donation', u'ICA_subcode_Event info', u'ICA_subcode_Event promotion', u'ICA_subcode_Event update', u'ICA_subcode_Get tested', u'ICA_subcode_HIV/AIDS info/news', u'ICA_subcode_Idea promotion', u'ICA_subcode_LGBT/Transgender information', u'ICA_subcode_Lobbying', u'ICA_subcode_Media action', u'ICA_subcode_Medication', u'ICA_subcode_National holiday/Holiday', u'ICA_subcode_Organizational news/announcement', u'ICA_subcode_Other', u'ICA_subcode_Patient stories', u'ICA_subcode_Recognition', u'ICA_subcode_Research/survey', u'ICA_subcode_Viewing action', u'ICA_subcode_Volunteer', u'type_event', u'type_link', u'type_music', u'type_photo', u'type_status', u'type_video', u'status_type_added_photos', u'status_type_added_video', u'status_type_created_event', u'status_type_created_note', u'status_type_mobile_status_update', u'status_type_published_story', u'status_type_shared_story', ...], dtype='object')"
       ]
      }
     ],
     "prompt_number": 82
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "<br> \n",
      "\n",
      "You might not want to include all of your variables in the summary statistics table. When you're dealing with a dataset with a lot of columns, I find the easiest way is to output the column names to a list, copy and paste the output into a text editor, do a replace all command to delete the all the u's before each column name, and then paste back into iPython and create your desired sub-set."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "df.columns.tolist()"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 56,
       "text": [
        "[u'No.',\n",
        " u'id',\n",
        " u'feed_id',\n",
        " u'org_name',\n",
        " u'FB_org_id',\n",
        " u'location',\n",
        " u'link',\n",
        " u'message_id',\n",
        " u'org_id',\n",
        " u'status_id',\n",
        " u'status_link',\n",
        " u'content',\n",
        " u'published_date',\n",
        " u'date_inserted',\n",
        " u'last_comment',\n",
        " u'type',\n",
        " u'status_type',\n",
        " u'video_source',\n",
        " u'picture_link',\n",
        " u'link_name',\n",
        " u'link_caption',\n",
        " u'link_description',\n",
        " u'num_mentions',\n",
        " u'mentions',\n",
        " u'num_likes',\n",
        " u'like_count',\n",
        " u'comment_count',\n",
        " u'share_count',\n",
        " u'hashtag_count',\n",
        " u'hashtags',\n",
        " u'mentions_count',\n",
        " u'actions',\n",
        " u'application',\n",
        " u'properties',\n",
        " u'message_output',\n",
        " u'time_since_post_days',\n",
        " u'like_count_7days',\n",
        " u'comment_count_7days',\n",
        " u'share_count_7days',\n",
        " u'time_since_post_14days',\n",
        " u'like_count_14days',\n",
        " u'comment_count_14days',\n",
        " u'share_count_14days',\n",
        " u'feed_organization',\n",
        " u'images',\n",
        " u'urls_count',\n",
        " u'urls_count_true',\n",
        " u'mentions_scrape',\n",
        " u'source',\n",
        " u'mission/non-mission',\n",
        " u'mission_focus',\n",
        " u'I-C-A',\n",
        " u'subcode',\n",
        " u'source_External',\n",
        " u'source_Internal',\n",
        " u'mission_N',\n",
        " u'mission_Y',\n",
        " u'mission_focus_Building capacity',\n",
        " u'mission_focus_Patient advocacy',\n",
        " u'mission_focus_Prevention',\n",
        " u'ICA_A',\n",
        " u'ICA_C',\n",
        " u'ICA_I',\n",
        " u'ICA_subcode_AIDS-related day',\n",
        " u'ICA_subcode_Awareness',\n",
        " u'ICA_subcode_Complementary support',\n",
        " u'ICA_subcode_Cover/profile photo',\n",
        " u'ICA_subcode_Dialogue',\n",
        " u'ICA_subcode_Donation',\n",
        " u'ICA_subcode_Event info',\n",
        " u'ICA_subcode_Event promotion',\n",
        " u'ICA_subcode_Event update',\n",
        " u'ICA_subcode_Get tested',\n",
        " u'ICA_subcode_HIV/AIDS info/news',\n",
        " u'ICA_subcode_Idea promotion',\n",
        " u'ICA_subcode_LGBT/Transgender information',\n",
        " u'ICA_subcode_Lobbying',\n",
        " u'ICA_subcode_Media action',\n",
        " u'ICA_subcode_Medication',\n",
        " u'ICA_subcode_National holiday/Holiday',\n",
        " u'ICA_subcode_Organizational news/announcement',\n",
        " u'ICA_subcode_Other',\n",
        " u'ICA_subcode_Patient stories',\n",
        " u'ICA_subcode_Recognition',\n",
        " u'ICA_subcode_Research/survey',\n",
        " u'ICA_subcode_Viewing action',\n",
        " u'ICA_subcode_Volunteer',\n",
        " u'type_event',\n",
        " u'type_link',\n",
        " u'type_music',\n",
        " u'type_photo',\n",
        " u'type_status',\n",
        " u'type_video',\n",
        " u'status_type_added_photos',\n",
        " u'status_type_added_video',\n",
        " u'status_type_created_event',\n",
        " u'status_type_created_note',\n",
        " u'status_type_mobile_status_update',\n",
        " u'status_type_published_story',\n",
        " u'status_type_shared_story',\n",
        " u'video_dummy',\n",
        " u'picture_dummy',\n",
        " u'Org_ID',\n",
        " u'Org_Name',\n",
        " u'Total_Revenue',\n",
        " u'Assets',\n",
        " u'followers_count',\n",
        " u'talking_about_count',\n",
        " u'were_here_count',\n",
        " u'Log_of_Assets',\n",
        " u'Log_of_Total_Revenue',\n",
        " u'Log_of_Followers',\n",
        " u'likes_binary',\n",
        " u'comment_binary',\n",
        " u'share_binary']"
       ]
      }
     ],
     "prompt_number": 56
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "<br>\n",
      "\n",
      "I've copy and pasted the above output into TextWrangler, omitted all the 'u's, and selected the columns I want. I will now limit the dataframe to just those columns I want. "
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "df = df[['hashtag_count', 'like_count_14days', 'comment_count_14days', 'share_count_14days',\n",
      " 'source', 'source_External', 'video_dummy', 'picture_dummy',\n",
      " 'Total_Revenue', 'Log_of_Total_Revenue', 'followers_count',  'Log_of_Followers']]"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 84
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "<br> \n",
      "\n",
      "As you can see, you now have a dataframe with only 12 columns (variables)"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "print len(df) \n",
      "print len(df.columns)\n",
      "df.head(2)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "1500\n",
        "12\n"
       ]
      },
      {
       "html": [
        "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
        "<table border=\"1\" class=\"dataframe\">\n",
        "  <thead>\n",
        "    <tr style=\"text-align: right;\">\n",
        "      <th></th>\n",
        "      <th>hashtag_count</th>\n",
        "      <th>like_count_14days</th>\n",
        "      <th>comment_count_14days</th>\n",
        "      <th>share_count_14days</th>\n",
        "      <th>source</th>\n",
        "      <th>source_External</th>\n",
        "      <th>video_dummy</th>\n",
        "      <th>picture_dummy</th>\n",
        "      <th>Total_Revenue</th>\n",
        "      <th>Log_of_Total_Revenue</th>\n",
        "      <th>followers_count</th>\n",
        "      <th>Log_of_Followers</th>\n",
        "    </tr>\n",
        "  </thead>\n",
        "  <tbody>\n",
        "    <tr>\n",
        "      <th>0</th>\n",
        "      <td> 1</td>\n",
        "      <td> 101</td>\n",
        "      <td> 3</td>\n",
        "      <td> 6</td>\n",
        "      <td> Internal</td>\n",
        "      <td> 0</td>\n",
        "      <td> 0</td>\n",
        "      <td> 1</td>\n",
        "      <td> 209986539</td>\n",
        "      <td>19.16</td>\n",
        "      <td> 721419</td>\n",
        "      <td>13.49</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>1</th>\n",
        "      <td> 1</td>\n",
        "      <td>  78</td>\n",
        "      <td> 9</td>\n",
        "      <td> 4</td>\n",
        "      <td> External</td>\n",
        "      <td> 1</td>\n",
        "      <td> 0</td>\n",
        "      <td> 1</td>\n",
        "      <td> 209986539</td>\n",
        "      <td>19.16</td>\n",
        "      <td> 721419</td>\n",
        "      <td>13.49</td>\n",
        "    </tr>\n",
        "  </tbody>\n",
        "</table>\n",
        "<p>2 rows \u00d7 12 columns</p>\n",
        "</div>"
       ],
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 85,
       "text": [
        "   hashtag_count  like_count_14days  comment_count_14days  share_count_14days  \\\n",
        "0              1                101                     3                   6   \n",
        "1              1                 78                     9                   4   \n",
        "\n",
        "     source  source_External  video_dummy  picture_dummy  Total_Revenue  \\\n",
        "0  Internal                0            0              1      209986539   \n",
        "1  External                1            0              1      209986539   \n",
        "\n",
        "   Log_of_Total_Revenue  followers_count  Log_of_Followers  \n",
        "0                 19.16           721419             13.49  \n",
        "1                 19.16           721419             13.49  \n",
        "\n",
        "[2 rows x 12 columns]"
       ]
      }
     ],
     "prompt_number": 85
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "Generate Summary Statistics "
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "This is the basic way to produce summary statistics for all variables in your dataframe"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "df.describe()"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "html": [
        "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
        "<table border=\"1\" class=\"dataframe\">\n",
        "  <thead>\n",
        "    <tr style=\"text-align: right;\">\n",
        "      <th></th>\n",
        "      <th>hashtag_count</th>\n",
        "      <th>like_count_14days</th>\n",
        "      <th>comment_count_14days</th>\n",
        "      <th>share_count_14days</th>\n",
        "      <th>source_External</th>\n",
        "      <th>video_dummy</th>\n",
        "      <th>picture_dummy</th>\n",
        "      <th>Total_Revenue</th>\n",
        "      <th>Log_of_Total_Revenue</th>\n",
        "      <th>followers_count</th>\n",
        "      <th>Log_of_Followers</th>\n",
        "    </tr>\n",
        "  </thead>\n",
        "  <tbody>\n",
        "    <tr>\n",
        "      <th>count</th>\n",
        "      <td>1500.00</td>\n",
        "      <td> 1500.00</td>\n",
        "      <td>1500.00</td>\n",
        "      <td>1500.00</td>\n",
        "      <td>1500.00</td>\n",
        "      <td>1500.00</td>\n",
        "      <td>1500.00</td>\n",
        "      <td>     1500.00</td>\n",
        "      <td>1500.00</td>\n",
        "      <td>  1500.00</td>\n",
        "      <td>1500.00</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>mean</th>\n",
        "      <td>   0.48</td>\n",
        "      <td>   30.59</td>\n",
        "      <td>   0.59</td>\n",
        "      <td>   3.11</td>\n",
        "      <td>   0.42</td>\n",
        "      <td>   0.03</td>\n",
        "      <td>   0.88</td>\n",
        "      <td> 11486117.19</td>\n",
        "      <td>  15.36</td>\n",
        "      <td> 20150.46</td>\n",
        "      <td>   7.79</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>std</th>\n",
        "      <td>   1.12</td>\n",
        "      <td>  355.34</td>\n",
        "      <td>   2.62</td>\n",
        "      <td>  14.95</td>\n",
        "      <td>   0.49</td>\n",
        "      <td>   0.18</td>\n",
        "      <td>   0.33</td>\n",
        "      <td> 31061197.86</td>\n",
        "      <td>   1.12</td>\n",
        "      <td>107022.51</td>\n",
        "      <td>   1.34</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>min</th>\n",
        "      <td>   0.00</td>\n",
        "      <td>    0.00</td>\n",
        "      <td>   0.00</td>\n",
        "      <td>   0.00</td>\n",
        "      <td>   0.00</td>\n",
        "      <td>   0.00</td>\n",
        "      <td>   0.00</td>\n",
        "      <td>  1015497.00</td>\n",
        "      <td>  13.83</td>\n",
        "      <td>    45.00</td>\n",
        "      <td>   3.81</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>25%</th>\n",
        "      <td>   0.00</td>\n",
        "      <td>    2.00</td>\n",
        "      <td>   0.00</td>\n",
        "      <td>   0.00</td>\n",
        "      <td>   0.00</td>\n",
        "      <td>   0.00</td>\n",
        "      <td>   1.00</td>\n",
        "      <td>  1954081.00</td>\n",
        "      <td>  14.49</td>\n",
        "      <td>  1089.50</td>\n",
        "      <td>   6.99</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>50%</th>\n",
        "      <td>   0.00</td>\n",
        "      <td>    5.00</td>\n",
        "      <td>   0.00</td>\n",
        "      <td>   0.00</td>\n",
        "      <td>   0.00</td>\n",
        "      <td>   0.00</td>\n",
        "      <td>   1.00</td>\n",
        "      <td>  4179577.00</td>\n",
        "      <td>  15.25</td>\n",
        "      <td>  2134.00</td>\n",
        "      <td>   7.67</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>75%</th>\n",
        "      <td>   0.00</td>\n",
        "      <td>   12.00</td>\n",
        "      <td>   0.00</td>\n",
        "      <td>   1.00</td>\n",
        "      <td>   1.00</td>\n",
        "      <td>   0.00</td>\n",
        "      <td>   1.00</td>\n",
        "      <td> 10633968.00</td>\n",
        "      <td>  16.18</td>\n",
        "      <td>  3481.00</td>\n",
        "      <td>   8.16</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>max</th>\n",
        "      <td>  10.00</td>\n",
        "      <td>11004.00</td>\n",
        "      <td>  59.00</td>\n",
        "      <td> 219.00</td>\n",
        "      <td>   1.00</td>\n",
        "      <td>   1.00</td>\n",
        "      <td>   1.00</td>\n",
        "      <td>209986539.00</td>\n",
        "      <td>  19.16</td>\n",
        "      <td>721419.00</td>\n",
        "      <td>  13.49</td>\n",
        "    </tr>\n",
        "  </tbody>\n",
        "</table>\n",
        "<p>8 rows \u00d7 11 columns</p>\n",
        "</div>"
       ],
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 86,
       "text": [
        "       hashtag_count  like_count_14days  comment_count_14days  \\\n",
        "count        1500.00            1500.00               1500.00   \n",
        "mean            0.48              30.59                  0.59   \n",
        "std             1.12             355.34                  2.62   \n",
        "min             0.00               0.00                  0.00   \n",
        "25%             0.00               2.00                  0.00   \n",
        "50%             0.00               5.00                  0.00   \n",
        "75%             0.00              12.00                  0.00   \n",
        "max            10.00           11004.00                 59.00   \n",
        "\n",
        "       share_count_14days  source_External  video_dummy  picture_dummy  \\\n",
        "count             1500.00          1500.00      1500.00        1500.00   \n",
        "mean                 3.11             0.42         0.03           0.88   \n",
        "std                 14.95             0.49         0.18           0.33   \n",
        "min                  0.00             0.00         0.00           0.00   \n",
        "25%                  0.00             0.00         0.00           1.00   \n",
        "50%                  0.00             0.00         0.00           1.00   \n",
        "75%                  1.00             1.00         0.00           1.00   \n",
        "max                219.00             1.00         1.00           1.00   \n",
        "\n",
        "       Total_Revenue  Log_of_Total_Revenue  followers_count  Log_of_Followers  \n",
        "count        1500.00               1500.00          1500.00           1500.00  \n",
        "mean     11486117.19                 15.36         20150.46              7.79  \n",
        "std      31061197.86                  1.12        107022.51              1.34  \n",
        "min       1015497.00                 13.83            45.00              3.81  \n",
        "25%       1954081.00                 14.49          1089.50              6.99  \n",
        "50%       4179577.00                 15.25          2134.00              7.67  \n",
        "75%      10633968.00                 16.18          3481.00              8.16  \n",
        "max     209986539.00                 19.16        721419.00             13.49  \n",
        "\n",
        "[8 rows x 11 columns]"
       ]
      }
     ],
     "prompt_number": 86
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "<br> \n",
      "\n",
      "If you'd like to see the help for the describe function"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "DataFrame.describe?"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "<br>\n",
      "\n",
      "Use the <i>dir</i> function to get an alphabetical listing of valid names (attributes) in an object."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "dir(df.describe())"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 93
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "<br>\n",
      "\n",
      "CHANGE TO TWO DECIMALS (n.b. - This step is not necessary if you have run the display.float_format command earlier)"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "np.round(df.describe(), 2)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "html": [
        "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
        "<table border=\"1\" class=\"dataframe\">\n",
        "  <thead>\n",
        "    <tr style=\"text-align: right;\">\n",
        "      <th></th>\n",
        "      <th>hashtag_count</th>\n",
        "      <th>like_count_14days</th>\n",
        "      <th>comment_count_14days</th>\n",
        "      <th>share_count_14days</th>\n",
        "      <th>source_External</th>\n",
        "      <th>video_dummy</th>\n",
        "      <th>picture_dummy</th>\n",
        "      <th>Total_Revenue</th>\n",
        "      <th>Log_of_Total_Revenue</th>\n",
        "      <th>followers_count</th>\n",
        "      <th>Log_of_Followers</th>\n",
        "    </tr>\n",
        "  </thead>\n",
        "  <tbody>\n",
        "    <tr>\n",
        "      <th>count</th>\n",
        "      <td>1500.00</td>\n",
        "      <td> 1500.00</td>\n",
        "      <td>1500.00</td>\n",
        "      <td>1500.00</td>\n",
        "      <td>1500.00</td>\n",
        "      <td>1500.00</td>\n",
        "      <td>1500.00</td>\n",
        "      <td>     1500.00</td>\n",
        "      <td>1500.00</td>\n",
        "      <td>  1500.00</td>\n",
        "      <td>1500.00</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>mean</th>\n",
        "      <td>   0.48</td>\n",
        "      <td>   30.59</td>\n",
        "      <td>   0.59</td>\n",
        "      <td>   3.11</td>\n",
        "      <td>   0.42</td>\n",
        "      <td>   0.03</td>\n",
        "      <td>   0.88</td>\n",
        "      <td> 11486117.19</td>\n",
        "      <td>  15.36</td>\n",
        "      <td> 20150.46</td>\n",
        "      <td>   7.79</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>std</th>\n",
        "      <td>   1.12</td>\n",
        "      <td>  355.34</td>\n",
        "      <td>   2.62</td>\n",
        "      <td>  14.95</td>\n",
        "      <td>   0.49</td>\n",
        "      <td>   0.18</td>\n",
        "      <td>   0.33</td>\n",
        "      <td> 31061197.86</td>\n",
        "      <td>   1.12</td>\n",
        "      <td>107022.51</td>\n",
        "      <td>   1.34</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>min</th>\n",
        "      <td>   0.00</td>\n",
        "      <td>    0.00</td>\n",
        "      <td>   0.00</td>\n",
        "      <td>   0.00</td>\n",
        "      <td>   0.00</td>\n",
        "      <td>   0.00</td>\n",
        "      <td>   0.00</td>\n",
        "      <td>  1015497.00</td>\n",
        "      <td>  13.83</td>\n",
        "      <td>    45.00</td>\n",
        "      <td>   3.81</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>25%</th>\n",
        "      <td>   0.00</td>\n",
        "      <td>    2.00</td>\n",
        "      <td>   0.00</td>\n",
        "      <td>   0.00</td>\n",
        "      <td>   0.00</td>\n",
        "      <td>   0.00</td>\n",
        "      <td>   1.00</td>\n",
        "      <td>  1954081.00</td>\n",
        "      <td>  14.49</td>\n",
        "      <td>  1089.50</td>\n",
        "      <td>   6.99</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>50%</th>\n",
        "      <td>   0.00</td>\n",
        "      <td>    5.00</td>\n",
        "      <td>   0.00</td>\n",
        "      <td>   0.00</td>\n",
        "      <td>   0.00</td>\n",
        "      <td>   0.00</td>\n",
        "      <td>   1.00</td>\n",
        "      <td>  4179577.00</td>\n",
        "      <td>  15.25</td>\n",
        "      <td>  2134.00</td>\n",
        "      <td>   7.67</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>75%</th>\n",
        "      <td>   0.00</td>\n",
        "      <td>   12.00</td>\n",
        "      <td>   0.00</td>\n",
        "      <td>   1.00</td>\n",
        "      <td>   1.00</td>\n",
        "      <td>   0.00</td>\n",
        "      <td>   1.00</td>\n",
        "      <td> 10633968.00</td>\n",
        "      <td>  16.18</td>\n",
        "      <td>  3481.00</td>\n",
        "      <td>   8.16</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>max</th>\n",
        "      <td>  10.00</td>\n",
        "      <td>11004.00</td>\n",
        "      <td>  59.00</td>\n",
        "      <td> 219.00</td>\n",
        "      <td>   1.00</td>\n",
        "      <td>   1.00</td>\n",
        "      <td>   1.00</td>\n",
        "      <td>209986539.00</td>\n",
        "      <td>  19.16</td>\n",
        "      <td>721419.00</td>\n",
        "      <td>  13.49</td>\n",
        "    </tr>\n",
        "  </tbody>\n",
        "</table>\n",
        "<p>8 rows \u00d7 11 columns</p>\n",
        "</div>"
       ],
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 88,
       "text": [
        "       hashtag_count  like_count_14days  comment_count_14days  \\\n",
        "count        1500.00            1500.00               1500.00   \n",
        "mean            0.48              30.59                  0.59   \n",
        "std             1.12             355.34                  2.62   \n",
        "min             0.00               0.00                  0.00   \n",
        "25%             0.00               2.00                  0.00   \n",
        "50%             0.00               5.00                  0.00   \n",
        "75%             0.00              12.00                  0.00   \n",
        "max            10.00           11004.00                 59.00   \n",
        "\n",
        "       share_count_14days  source_External  video_dummy  picture_dummy  \\\n",
        "count             1500.00          1500.00      1500.00        1500.00   \n",
        "mean                 3.11             0.42         0.03           0.88   \n",
        "std                 14.95             0.49         0.18           0.33   \n",
        "min                  0.00             0.00         0.00           0.00   \n",
        "25%                  0.00             0.00         0.00           1.00   \n",
        "50%                  0.00             0.00         0.00           1.00   \n",
        "75%                  1.00             1.00         0.00           1.00   \n",
        "max                219.00             1.00         1.00           1.00   \n",
        "\n",
        "       Total_Revenue  Log_of_Total_Revenue  followers_count  Log_of_Followers  \n",
        "count        1500.00               1500.00          1500.00           1500.00  \n",
        "mean     11486117.19                 15.36         20150.46              7.79  \n",
        "std      31061197.86                  1.12        107022.51              1.34  \n",
        "min       1015497.00                 13.83            45.00              3.81  \n",
        "25%       1954081.00                 14.49          1089.50              6.99  \n",
        "50%       4179577.00                 15.25          2134.00              7.67  \n",
        "75%      10633968.00                 16.18          3481.00              8.16  \n",
        "max     209986539.00                 19.16        721419.00             13.49  \n",
        "\n",
        "[8 rows x 11 columns]"
       ]
      }
     ],
     "prompt_number": 88
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "<br>\n",
      "\n",
      "NOW LET'S TRANSPOSE THE OUTPUT -- necessary for a more typical social scientific presentation of the data"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "np.round(df.describe(), 2).T"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "html": [
        "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
        "<table border=\"1\" class=\"dataframe\">\n",
        "  <thead>\n",
        "    <tr style=\"text-align: right;\">\n",
        "      <th></th>\n",
        "      <th>count</th>\n",
        "      <th>mean</th>\n",
        "      <th>std</th>\n",
        "      <th>min</th>\n",
        "      <th>25%</th>\n",
        "      <th>50%</th>\n",
        "      <th>75%</th>\n",
        "      <th>max</th>\n",
        "    </tr>\n",
        "  </thead>\n",
        "  <tbody>\n",
        "    <tr>\n",
        "      <th>hashtag_count</th>\n",
        "      <td>1500.00</td>\n",
        "      <td>       0.48</td>\n",
        "      <td>       1.12</td>\n",
        "      <td>      0.00</td>\n",
        "      <td>      0.00</td>\n",
        "      <td>      0.00</td>\n",
        "      <td>       0.00</td>\n",
        "      <td>       10.00</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>like_count_14days</th>\n",
        "      <td>1500.00</td>\n",
        "      <td>      30.59</td>\n",
        "      <td>     355.34</td>\n",
        "      <td>      0.00</td>\n",
        "      <td>      2.00</td>\n",
        "      <td>      5.00</td>\n",
        "      <td>      12.00</td>\n",
        "      <td>    11004.00</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>comment_count_14days</th>\n",
        "      <td>1500.00</td>\n",
        "      <td>       0.59</td>\n",
        "      <td>       2.62</td>\n",
        "      <td>      0.00</td>\n",
        "      <td>      0.00</td>\n",
        "      <td>      0.00</td>\n",
        "      <td>       0.00</td>\n",
        "      <td>       59.00</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>share_count_14days</th>\n",
        "      <td>1500.00</td>\n",
        "      <td>       3.11</td>\n",
        "      <td>      14.95</td>\n",
        "      <td>      0.00</td>\n",
        "      <td>      0.00</td>\n",
        "      <td>      0.00</td>\n",
        "      <td>       1.00</td>\n",
        "      <td>      219.00</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>source_External</th>\n",
        "      <td>1500.00</td>\n",
        "      <td>       0.42</td>\n",
        "      <td>       0.49</td>\n",
        "      <td>      0.00</td>\n",
        "      <td>      0.00</td>\n",
        "      <td>      0.00</td>\n",
        "      <td>       1.00</td>\n",
        "      <td>        1.00</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>video_dummy</th>\n",
        "      <td>1500.00</td>\n",
        "      <td>       0.03</td>\n",
        "      <td>       0.18</td>\n",
        "      <td>      0.00</td>\n",
        "      <td>      0.00</td>\n",
        "      <td>      0.00</td>\n",
        "      <td>       0.00</td>\n",
        "      <td>        1.00</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>picture_dummy</th>\n",
        "      <td>1500.00</td>\n",
        "      <td>       0.88</td>\n",
        "      <td>       0.33</td>\n",
        "      <td>      0.00</td>\n",
        "      <td>      1.00</td>\n",
        "      <td>      1.00</td>\n",
        "      <td>       1.00</td>\n",
        "      <td>        1.00</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>Total_Revenue</th>\n",
        "      <td>1500.00</td>\n",
        "      <td>11486117.19</td>\n",
        "      <td>31061197.86</td>\n",
        "      <td>1015497.00</td>\n",
        "      <td>1954081.00</td>\n",
        "      <td>4179577.00</td>\n",
        "      <td>10633968.00</td>\n",
        "      <td>209986539.00</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>Log_of_Total_Revenue</th>\n",
        "      <td>1500.00</td>\n",
        "      <td>      15.36</td>\n",
        "      <td>       1.12</td>\n",
        "      <td>     13.83</td>\n",
        "      <td>     14.49</td>\n",
        "      <td>     15.25</td>\n",
        "      <td>      16.18</td>\n",
        "      <td>       19.16</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>followers_count</th>\n",
        "      <td>1500.00</td>\n",
        "      <td>   20150.46</td>\n",
        "      <td>  107022.51</td>\n",
        "      <td>     45.00</td>\n",
        "      <td>   1089.50</td>\n",
        "      <td>   2134.00</td>\n",
        "      <td>    3481.00</td>\n",
        "      <td>   721419.00</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>Log_of_Followers</th>\n",
        "      <td>1500.00</td>\n",
        "      <td>       7.79</td>\n",
        "      <td>       1.34</td>\n",
        "      <td>      3.81</td>\n",
        "      <td>      6.99</td>\n",
        "      <td>      7.67</td>\n",
        "      <td>       8.16</td>\n",
        "      <td>       13.49</td>\n",
        "    </tr>\n",
        "  </tbody>\n",
        "</table>\n",
        "<p>11 rows \u00d7 8 columns</p>\n",
        "</div>"
       ],
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 89,
       "text": [
        "                       count        mean         std        min        25%  \\\n",
        "hashtag_count        1500.00        0.48        1.12       0.00       0.00   \n",
        "like_count_14days    1500.00       30.59      355.34       0.00       2.00   \n",
        "comment_count_14days 1500.00        0.59        2.62       0.00       0.00   \n",
        "share_count_14days   1500.00        3.11       14.95       0.00       0.00   \n",
        "source_External      1500.00        0.42        0.49       0.00       0.00   \n",
        "video_dummy          1500.00        0.03        0.18       0.00       0.00   \n",
        "picture_dummy        1500.00        0.88        0.33       0.00       1.00   \n",
        "Total_Revenue        1500.00 11486117.19 31061197.86 1015497.00 1954081.00   \n",
        "Log_of_Total_Revenue 1500.00       15.36        1.12      13.83      14.49   \n",
        "followers_count      1500.00    20150.46   107022.51      45.00    1089.50   \n",
        "Log_of_Followers     1500.00        7.79        1.34       3.81       6.99   \n",
        "\n",
        "                            50%         75%          max  \n",
        "hashtag_count              0.00        0.00        10.00  \n",
        "like_count_14days          5.00       12.00     11004.00  \n",
        "comment_count_14days       0.00        0.00        59.00  \n",
        "share_count_14days         0.00        1.00       219.00  \n",
        "source_External            0.00        1.00         1.00  \n",
        "video_dummy                0.00        0.00         1.00  \n",
        "picture_dummy              1.00        1.00         1.00  \n",
        "Total_Revenue        4179577.00 10633968.00 209986539.00  \n",
        "Log_of_Total_Revenue      15.25       16.18        19.16  \n",
        "followers_count         2134.00     3481.00    721419.00  \n",
        "Log_of_Followers           7.67        8.16        13.49  \n",
        "\n",
        "[11 rows x 8 columns]"
       ]
      }
     ],
     "prompt_number": 89
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "<br>\n",
      "\n",
      "We won't typically want the percentile columns in a social scientific publication. In version 0.16 of PANDAS, you can use  'percentiles=None' with the describe command to omit the percentiles. In PANDAS 0.13 we can instead select only those columns we want, then output to CSV"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "np.round(df.describe(), 2).T[['count','mean', 'std', 'min', 'max']]"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "html": [
        "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
        "<table border=\"1\" class=\"dataframe\">\n",
        "  <thead>\n",
        "    <tr style=\"text-align: right;\">\n",
        "      <th></th>\n",
        "      <th>count</th>\n",
        "      <th>mean</th>\n",
        "      <th>std</th>\n",
        "      <th>min</th>\n",
        "      <th>max</th>\n",
        "    </tr>\n",
        "  </thead>\n",
        "  <tbody>\n",
        "    <tr>\n",
        "      <th>hashtag_count</th>\n",
        "      <td>1500.00</td>\n",
        "      <td>       0.48</td>\n",
        "      <td>       1.12</td>\n",
        "      <td>      0.00</td>\n",
        "      <td>       10.00</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>like_count_14days</th>\n",
        "      <td>1500.00</td>\n",
        "      <td>      30.59</td>\n",
        "      <td>     355.34</td>\n",
        "      <td>      0.00</td>\n",
        "      <td>    11004.00</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>comment_count_14days</th>\n",
        "      <td>1500.00</td>\n",
        "      <td>       0.59</td>\n",
        "      <td>       2.62</td>\n",
        "      <td>      0.00</td>\n",
        "      <td>       59.00</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>share_count_14days</th>\n",
        "      <td>1500.00</td>\n",
        "      <td>       3.11</td>\n",
        "      <td>      14.95</td>\n",
        "      <td>      0.00</td>\n",
        "      <td>      219.00</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>source_External</th>\n",
        "      <td>1500.00</td>\n",
        "      <td>       0.42</td>\n",
        "      <td>       0.49</td>\n",
        "      <td>      0.00</td>\n",
        "      <td>        1.00</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>video_dummy</th>\n",
        "      <td>1500.00</td>\n",
        "      <td>       0.03</td>\n",
        "      <td>       0.18</td>\n",
        "      <td>      0.00</td>\n",
        "      <td>        1.00</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>picture_dummy</th>\n",
        "      <td>1500.00</td>\n",
        "      <td>       0.88</td>\n",
        "      <td>       0.33</td>\n",
        "      <td>      0.00</td>\n",
        "      <td>        1.00</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>Total_Revenue</th>\n",
        "      <td>1500.00</td>\n",
        "      <td>11486117.19</td>\n",
        "      <td>31061197.86</td>\n",
        "      <td>1015497.00</td>\n",
        "      <td>209986539.00</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>Log_of_Total_Revenue</th>\n",
        "      <td>1500.00</td>\n",
        "      <td>      15.36</td>\n",
        "      <td>       1.12</td>\n",
        "      <td>     13.83</td>\n",
        "      <td>       19.16</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>followers_count</th>\n",
        "      <td>1500.00</td>\n",
        "      <td>   20150.46</td>\n",
        "      <td>  107022.51</td>\n",
        "      <td>     45.00</td>\n",
        "      <td>   721419.00</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>Log_of_Followers</th>\n",
        "      <td>1500.00</td>\n",
        "      <td>       7.79</td>\n",
        "      <td>       1.34</td>\n",
        "      <td>      3.81</td>\n",
        "      <td>       13.49</td>\n",
        "    </tr>\n",
        "  </tbody>\n",
        "</table>\n",
        "<p>11 rows \u00d7 5 columns</p>\n",
        "</div>"
       ],
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 90,
       "text": [
        "                       count        mean         std        min          max\n",
        "hashtag_count        1500.00        0.48        1.12       0.00        10.00\n",
        "like_count_14days    1500.00       30.59      355.34       0.00     11004.00\n",
        "comment_count_14days 1500.00        0.59        2.62       0.00        59.00\n",
        "share_count_14days   1500.00        3.11       14.95       0.00       219.00\n",
        "source_External      1500.00        0.42        0.49       0.00         1.00\n",
        "video_dummy          1500.00        0.03        0.18       0.00         1.00\n",
        "picture_dummy        1500.00        0.88        0.33       0.00         1.00\n",
        "Total_Revenue        1500.00 11486117.19 31061197.86 1015497.00 209986539.00\n",
        "Log_of_Total_Revenue 1500.00       15.36        1.12      13.83        19.16\n",
        "followers_count      1500.00    20150.46   107022.51      45.00    721419.00\n",
        "Log_of_Followers     1500.00        7.79        1.34       3.81        13.49\n",
        "\n",
        "[11 rows x 5 columns]"
       ]
      }
     ],
     "prompt_number": 90
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "#ALTERNATIVE WAY OF WRITING\n",
      "np.round(df.describe(), 2).transpose()"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "html": [
        "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
        "<table border=\"1\" class=\"dataframe\">\n",
        "  <thead>\n",
        "    <tr style=\"text-align: right;\">\n",
        "      <th></th>\n",
        "      <th>count</th>\n",
        "      <th>mean</th>\n",
        "      <th>std</th>\n",
        "      <th>min</th>\n",
        "      <th>25%</th>\n",
        "      <th>50%</th>\n",
        "      <th>75%</th>\n",
        "      <th>max</th>\n",
        "    </tr>\n",
        "  </thead>\n",
        "  <tbody>\n",
        "    <tr>\n",
        "      <th>hashtag_count</th>\n",
        "      <td>1500.00</td>\n",
        "      <td>       0.48</td>\n",
        "      <td>       1.12</td>\n",
        "      <td>      0.00</td>\n",
        "      <td>      0.00</td>\n",
        "      <td>      0.00</td>\n",
        "      <td>       0.00</td>\n",
        "      <td>       10.00</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>like_count_14days</th>\n",
        "      <td>1500.00</td>\n",
        "      <td>      30.59</td>\n",
        "      <td>     355.34</td>\n",
        "      <td>      0.00</td>\n",
        "      <td>      2.00</td>\n",
        "      <td>      5.00</td>\n",
        "      <td>      12.00</td>\n",
        "      <td>    11004.00</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>comment_count_14days</th>\n",
        "      <td>1500.00</td>\n",
        "      <td>       0.59</td>\n",
        "      <td>       2.62</td>\n",
        "      <td>      0.00</td>\n",
        "      <td>      0.00</td>\n",
        "      <td>      0.00</td>\n",
        "      <td>       0.00</td>\n",
        "      <td>       59.00</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>share_count_14days</th>\n",
        "      <td>1500.00</td>\n",
        "      <td>       3.11</td>\n",
        "      <td>      14.95</td>\n",
        "      <td>      0.00</td>\n",
        "      <td>      0.00</td>\n",
        "      <td>      0.00</td>\n",
        "      <td>       1.00</td>\n",
        "      <td>      219.00</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>source_External</th>\n",
        "      <td>1500.00</td>\n",
        "      <td>       0.42</td>\n",
        "      <td>       0.49</td>\n",
        "      <td>      0.00</td>\n",
        "      <td>      0.00</td>\n",
        "      <td>      0.00</td>\n",
        "      <td>       1.00</td>\n",
        "      <td>        1.00</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>video_dummy</th>\n",
        "      <td>1500.00</td>\n",
        "      <td>       0.03</td>\n",
        "      <td>       0.18</td>\n",
        "      <td>      0.00</td>\n",
        "      <td>      0.00</td>\n",
        "      <td>      0.00</td>\n",
        "      <td>       0.00</td>\n",
        "      <td>        1.00</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>picture_dummy</th>\n",
        "      <td>1500.00</td>\n",
        "      <td>       0.88</td>\n",
        "      <td>       0.33</td>\n",
        "      <td>      0.00</td>\n",
        "      <td>      1.00</td>\n",
        "      <td>      1.00</td>\n",
        "      <td>       1.00</td>\n",
        "      <td>        1.00</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>Total_Revenue</th>\n",
        "      <td>1500.00</td>\n",
        "      <td>11486117.19</td>\n",
        "      <td>31061197.86</td>\n",
        "      <td>1015497.00</td>\n",
        "      <td>1954081.00</td>\n",
        "      <td>4179577.00</td>\n",
        "      <td>10633968.00</td>\n",
        "      <td>209986539.00</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>Log_of_Total_Revenue</th>\n",
        "      <td>1500.00</td>\n",
        "      <td>      15.36</td>\n",
        "      <td>       1.12</td>\n",
        "      <td>     13.83</td>\n",
        "      <td>     14.49</td>\n",
        "      <td>     15.25</td>\n",
        "      <td>      16.18</td>\n",
        "      <td>       19.16</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>followers_count</th>\n",
        "      <td>1500.00</td>\n",
        "      <td>   20150.46</td>\n",
        "      <td>  107022.51</td>\n",
        "      <td>     45.00</td>\n",
        "      <td>   1089.50</td>\n",
        "      <td>   2134.00</td>\n",
        "      <td>    3481.00</td>\n",
        "      <td>   721419.00</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>Log_of_Followers</th>\n",
        "      <td>1500.00</td>\n",
        "      <td>       7.79</td>\n",
        "      <td>       1.34</td>\n",
        "      <td>      3.81</td>\n",
        "      <td>      6.99</td>\n",
        "      <td>      7.67</td>\n",
        "      <td>       8.16</td>\n",
        "      <td>       13.49</td>\n",
        "    </tr>\n",
        "  </tbody>\n",
        "</table>\n",
        "<p>11 rows \u00d7 8 columns</p>\n",
        "</div>"
       ],
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 92,
       "text": [
        "                       count        mean         std        min        25%  \\\n",
        "hashtag_count        1500.00        0.48        1.12       0.00       0.00   \n",
        "like_count_14days    1500.00       30.59      355.34       0.00       2.00   \n",
        "comment_count_14days 1500.00        0.59        2.62       0.00       0.00   \n",
        "share_count_14days   1500.00        3.11       14.95       0.00       0.00   \n",
        "source_External      1500.00        0.42        0.49       0.00       0.00   \n",
        "video_dummy          1500.00        0.03        0.18       0.00       0.00   \n",
        "picture_dummy        1500.00        0.88        0.33       0.00       1.00   \n",
        "Total_Revenue        1500.00 11486117.19 31061197.86 1015497.00 1954081.00   \n",
        "Log_of_Total_Revenue 1500.00       15.36        1.12      13.83      14.49   \n",
        "followers_count      1500.00    20150.46   107022.51      45.00    1089.50   \n",
        "Log_of_Followers     1500.00        7.79        1.34       3.81       6.99   \n",
        "\n",
        "                            50%         75%          max  \n",
        "hashtag_count              0.00        0.00        10.00  \n",
        "like_count_14days          5.00       12.00     11004.00  \n",
        "comment_count_14days       0.00        0.00        59.00  \n",
        "share_count_14days         0.00        1.00       219.00  \n",
        "source_External            0.00        1.00         1.00  \n",
        "video_dummy                0.00        0.00         1.00  \n",
        "picture_dummy              1.00        1.00         1.00  \n",
        "Total_Revenue        4179577.00 10633968.00 209986539.00  \n",
        "Log_of_Total_Revenue      15.25       16.18        19.16  \n",
        "followers_count         2134.00     3481.00    721419.00  \n",
        "Log_of_Followers           7.67        8.16        13.49  \n",
        "\n",
        "[11 rows x 8 columns]"
       ]
      }
     ],
     "prompt_number": 92
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "SAVE THE OUTPUT OF THE TABLE AS A CSV FILE"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "<br>\n",
      "\n",
      "We can use the above commands and output to CSV"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "#WITH FOUR DECIMAL PLACES (DEFAULT)\n",
      "df.describe().transpose().to_csv('summary stats.csv', sep=',')"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 41
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "<br>\n",
      "\n",
      "For a typical social scientific publication, we would not need the percentile columns. In version 0.16 of PANDAS, you can use  'percentiles=None' with the describe command to omit the percentiles. In PANDAS 0.13 we can instead select only those columns we want, then output to CSV"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "df.describe().transpose()[['count','mean', 'std', 'min', 'max']].to_csv('summary stats.csv', sep=',')"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 42
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "<br>\n",
      "\n",
      "The problem with the above output is that more than 2 decimal places are showing. If you want only two, then run the following version."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "#WITH TWO DECIMAL PLACES\n",
      "np.round(df.describe(), 2).T[['count','mean', 'std', 'min', 'max']].to_csv('summary stats.csv', sep=',')"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 43
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "<br>\n",
      "\n",
      "Now you have a CSV file containing the columns you'll need for a typical Summary Statistics or Descriptive Statistics table for a submission to a social science journal. You likely won't want all of the columns in the final table, so I would probably open up the CSV file in Excel, delete unwanted variables, then copy and paste into Word. At that point you just need some formatting for aesthetics. If you do want to select which specific variables to include, you can specify the columns like this."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "cols = ['hashtag_count','like_count_14days']\n",
      "np.round(df[cols].describe(), 2).T[['count','mean', 'std', 'min', 'max']].to_csv('summary stats (partial).csv', sep=',')"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 55
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "<br>\n",
      "\n",
      "For more Notebooks as well as additional Python and Big Data tutorials, please visit http://social-metrics.org or follow me on Twitter <a href='https://twitter.com/gregorysaxton'>@gregorysaxton</a>"
     ]
    }
   ],
   "metadata": {}
  }
 ]
}