NithyaSahadevan/4.3_Loading_Data_and_Viewing_Datajupyter_labs_v1.ipynb

## 4.3_Loading_Data_and_Viewing_Datajupyter_labs_v1.ipynb
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-block alert-info\" style=\"margin-top: 20px\">\n",
    " <a href=\"http://cocl.us/NotebooksPython101\"><img src = \"https://ibm.box.com/shared/static/yfe6h4az47ktg2mm9h05wby2n7e8kei3.png\" width = 750, align = \"center\"></a>\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# <a href=\"https://www.bigdatauniversity.com\"><img src = \"https://ibm.box.com/shared/static/ugcqz6ohbvff804xp84y4kqnvvk3bq1g.png\" width = 300, align = \"center\"></a>\n",
    "\n",
    "<h1 align=center><font size = 5>Introduction to Pandas  Python</font></h1>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Table of Contents\n",
    "\n",
    "\n",
    "<div class=\"alert alert-block alert-info\" style=\"margin-top: 20px\">\n",
    "<li><a href=\"#ref0\">About the Dataset</a></li>\n",
    "\n",
    "<li><a href=\"#ref2\">Viewing Data and Accessing Data </a></p></li>\n",
    "<br>\n",
    "<p></p>\n",
    "Estimated Time Needed: <strong>15 min</strong>\n",
    "</div>\n",
    "\n",
    "<hr>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"ref0\"></a>\n",
    "<h2 align=center>About the Dataset</h2>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "\n",
    "The table has one row for each album and several columns\n",
    "\n",
    "- **artist** - Name of the artist\n",
    "- **album** - Name of the album\n",
    "- **released_year** - Year the album was released\n",
    "- **length_min_sec** - Length of the album (hours,minutes,seconds)\n",
    "- **genre** - Genre of the album\n",
    "- **music_recording_sales_millions** - Music recording sales (millions in USD) on [SONG://DATABASE](http://www.song-database.com/)\n",
    "- **claimed_sales_millions** - Album's claimed sales (millions in USD) on [SONG://DATABASE](http://www.song-database.com/)\n",
    "- **date_released** - Date on which the album was released\n",
    "- **soundtrack** - Indicates if the album is the movie soundtrack (Y) or (N)\n",
    "- **rating_of_friends** - Indicates the rating from your friends from 1 to 10\n",
    "<br>\n",
    "\n",
    "You can see the dataset here:\n",
    "\n",
    "<font size=\"1\">\n",
    "<table font-size:xx-small style=\"width:25%\">\n",
    "  <tr>\n",
    "    <th>Artist</th>\n",
    "    <th>Album</th> \n",
    "    <th>Released</th>\n",
    "    <th>Length</th>\n",
    "    <th>Genre</th> \n",
    "    <th>Music recording sales (millions)</th>\n",
    "    <th>Claimed sales (millions)</th>\n",
    "    <th>Released</th>\n",
    "    <th>Soundtrack</th>\n",
    "    <th>Rating (friends)</th>\n",
    "  </tr>\n",
    "  <tr>\n",
    "    <td>Michael Jackson</td>\n",
    "    <td>Thriller</td> \n",
    "    <td>1982</td>\n",
    "    <td>00:42:19</td>\n",
    "    <td>Pop, rock, R&B</td>\n",
    "    <td>46</td>\n",
    "    <td>65</td>\n",
    "    <td>30-Nov-82</td>\n",
    "    <td></td>\n",
    "    <td>10.0</td>\n",
    "  </tr>\n",
    "  <tr>\n",
    "    <td>AC/DC</td>\n",
    "    <td>Back in Black</td> \n",
    "    <td>1980</td>\n",
    "    <td>00:42:11</td>\n",
    "    <td>Hard rock</td>\n",
    "    <td>26.1</td>\n",
    "    <td>50</td>\n",
    "    <td>25-Jul-80</td>\n",
    "    <td></td>\n",
    "    <td>8.5</td>\n",
    "  </tr>\n",
    "    <tr>\n",
    "    <td>Pink Floyd</td>\n",
    "    <td>The Dark Side of the Moon</td> \n",
    "    <td>1973</td>\n",
    "    <td>00:42:49</td>\n",
    "    <td>Progressive rock</td>\n",
    "    <td>24.2</td>\n",
    "    <td>45</td>\n",
    "    <td>01-Mar-73</td>\n",
    "    <td></td>\n",
    "    <td>9.5</td>\n",
    "  </tr>\n",
    "    <tr>\n",
    "    <td>Whitney Houston</td>\n",
    "    <td>The Bodyguard</td> \n",
    "    <td>1992</td>\n",
    "    <td>00:57:44</td>\n",
    "    <td>Soundtrack/R&B, soul, pop</td>\n",
    "    <td>26.1</td>\n",
    "    <td>50</td>\n",
    "    <td>25-Jul-80</td>\n",
    "    <td>Y</td>\n",
    "    <td>7.0</td>\n",
    "  </tr>\n",
    "    <tr>\n",
    "    <td>Meat Loaf</td>\n",
    "    <td>Bat Out of Hell</td> \n",
    "    <td>1977</td>\n",
    "    <td>00:46:33</td>\n",
    "    <td>Hard rock, progressive rock</td>\n",
    "    <td>20.6</td>\n",
    "    <td>43</td>\n",
    "    <td>21-Oct-77</td>\n",
    "    <td></td>\n",
    "    <td>7.0</td>\n",
    "  </tr>\n",
    "    <tr>\n",
    "    <td>Eagles</td>\n",
    "    <td>Their Greatest Hits (1971-1975)</td> \n",
    "    <td>1976</td>\n",
    "    <td>00:43:08</td>\n",
    "    <td>Rock, soft rock, folk rock</td>\n",
    "    <td>32.2</td>\n",
    "    <td>42</td>\n",
    "    <td>17-Feb-76</td>\n",
    "    <td></td>\n",
    "    <td>9.5</td>\n",
    "  </tr>\n",
    "    <tr>\n",
    "    <td>Bee Gees</td>\n",
    "    <td>Saturday Night Fever</td> \n",
    "    <td>1977</td>\n",
    "    <td>1:15:54</td>\n",
    "    <td>Disco</td>\n",
    "    <td>20.6</td>\n",
    "    <td>40</td>\n",
    "    <td>15-Nov-77</td>\n",
    "    <td>Y</td>\n",
    "    <td>9.0</td>\n",
    "  </tr>\n",
    "    <tr>\n",
    "    <td>Fleetwood Mac</td>\n",
    "    <td>Rumours</td> \n",
    "    <td>1977</td>\n",
    "    <td>00:40:01</td>\n",
    "    <td>Soft rock</td>\n",
    "    <td>27.9</td>\n",
    "    <td>40</td>\n",
    "    <td>04-Feb-77</td>\n",
    "    <td></td>\n",
    "    <td>9.5</td>\n",
    "  </tr>\n",
    "</table></font>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "After the import command, we now have access to a large number of pre-built classes and functions. This assumes the library is installed; in our lab environment all the necessary libraries are installed. One way pandas allows you to work with data is a dataframe. Let's go through the process to go from a comma separated values (**.csv** ) file to a dataframe. This variable **csv_path** stores the path of the  **.csv** ,that is  used as an argument to the **read_csv** function. The result is stored in the object ** df**, this is a common short form used for a variable referring to a Pandas dataframe. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "csv_path='https://ibm.box.com/shared/static/keo2qz0bvh4iu6gf5qjq4vdrkt67bvvb.csv'\n",
    "df = pd.read_csv(csv_path)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can use the method **head()** to examine the first five rows of a dataframe: \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Artist</th>\n",
       "      <th>Album</th>\n",
       "      <th>Released</th>\n",
       "      <th>Length</th>\n",
       "      <th>Genre</th>\n",
       "      <th>Music Recording Sales (millions)</th>\n",
       "      <th>Claimed Sales (millions)</th>\n",
       "      <th>Released.1</th>\n",
       "      <th>Soundtrack</th>\n",
       "      <th>Rating</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Michael Jackson</td>\n",
       "      <td>Thriller</td>\n",
       "      <td>1982</td>\n",
       "      <td>0:42:19</td>\n",
       "      <td>pop, rock, R&amp;B</td>\n",
       "      <td>46.0</td>\n",
       "      <td>65</td>\n",
       "      <td>30-Nov-82</td>\n",
       "      <td>NaN</td>\n",
       "      <td>10.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>AC/DC</td>\n",
       "      <td>Back in Black</td>\n",
       "      <td>1980</td>\n",
       "      <td>0:42:11</td>\n",
       "      <td>hard rock</td>\n",
       "      <td>26.1</td>\n",
       "      <td>50</td>\n",
       "      <td>25-Jul-80</td>\n",
       "      <td>NaN</td>\n",
       "      <td>9.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Pink Floyd</td>\n",
       "      <td>The Dark Side of the Moon</td>\n",
       "      <td>1973</td>\n",
       "      <td>0:42:49</td>\n",
       "      <td>progressive rock</td>\n",
       "      <td>24.2</td>\n",
       "      <td>45</td>\n",
       "      <td>01-Mar-73</td>\n",
       "      <td>NaN</td>\n",
       "      <td>9.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Whitney Houston</td>\n",
       "      <td>The Bodyguard</td>\n",
       "      <td>1992</td>\n",
       "      <td>0:57:44</td>\n",
       "      <td>R&amp;B, soul, pop</td>\n",
       "      <td>27.4</td>\n",
       "      <td>44</td>\n",
       "      <td>17-Nov-92</td>\n",
       "      <td>Y</td>\n",
       "      <td>8.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Meat Loaf</td>\n",
       "      <td>Bat Out of Hell</td>\n",
       "      <td>1977</td>\n",
       "      <td>0:46:33</td>\n",
       "      <td>hard rock, progressive rock</td>\n",
       "      <td>20.6</td>\n",
       "      <td>43</td>\n",
       "      <td>21-Oct-77</td>\n",
       "      <td>NaN</td>\n",
       "      <td>8.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            Artist                      Album  Released   Length  \\\n",
       "0  Michael Jackson                   Thriller      1982  0:42:19   \n",
       "1            AC/DC              Back in Black      1980  0:42:11   \n",
       "2       Pink Floyd  The Dark Side of the Moon      1973  0:42:49   \n",
       "3  Whitney Houston              The Bodyguard      1992  0:57:44   \n",
       "4        Meat Loaf            Bat Out of Hell      1977  0:46:33   \n",
       "\n",
       "                         Genre  Music Recording Sales (millions)  \\\n",
       "0               pop, rock, R&B                              46.0   \n",
       "1                    hard rock                              26.1   \n",
       "2             progressive rock                              24.2   \n",
       "3               R&B, soul, pop                              27.4   \n",
       "4  hard rock, progressive rock                              20.6   \n",
       "\n",
       "   Claimed Sales (millions) Released.1 Soundtrack  Rating  \n",
       "0                        65  30-Nov-82        NaN    10.0  \n",
       "1                        50  25-Jul-80        NaN     9.5  \n",
       "2                        45  01-Mar-73        NaN     9.0  \n",
       "3                        44  17-Nov-92          Y     8.5  \n",
       "4                        43  21-Oct-77        NaN     8.0  "
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Requirement already satisfied: xlrd in /home/jupyterlab/conda/lib/python3.6/site-packages (1.1.0)\n"
     ]
    }
   ],
   "source": [
    "#dependency  needed to install file \n",
    "!pip install xlrd"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " We use the path of the excel file and the function **read_excel**. The result is a data frame as before:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Artist</th>\n",
       "      <th>Album</th>\n",
       "      <th>Released</th>\n",
       "      <th>Length</th>\n",
       "      <th>Genre</th>\n",
       "      <th>Music Recording Sales (millions)</th>\n",
       "      <th>Claimed Sales (millions)</th>\n",
       "      <th>Released.1</th>\n",
       "      <th>Soundtrack</th>\n",
       "      <th>Rating</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Michael Jackson</td>\n",
       "      <td>Thriller</td>\n",
       "      <td>1982</td>\n",
       "      <td>00:42:19</td>\n",
       "      <td>pop, rock, R&amp;B</td>\n",
       "      <td>46.0</td>\n",
       "      <td>65</td>\n",
       "      <td>1982-11-30</td>\n",
       "      <td>NaN</td>\n",
       "      <td>10.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>AC/DC</td>\n",
       "      <td>Back in Black</td>\n",
       "      <td>1980</td>\n",
       "      <td>00:42:11</td>\n",
       "      <td>hard rock</td>\n",
       "      <td>26.1</td>\n",
       "      <td>50</td>\n",
       "      <td>1980-07-25</td>\n",
       "      <td>NaN</td>\n",
       "      <td>9.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Pink Floyd</td>\n",
       "      <td>The Dark Side of the Moon</td>\n",
       "      <td>1973</td>\n",
       "      <td>00:42:49</td>\n",
       "      <td>progressive rock</td>\n",
       "      <td>24.2</td>\n",
       "      <td>45</td>\n",
       "      <td>1973-03-01</td>\n",
       "      <td>NaN</td>\n",
       "      <td>9.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Whitney Houston</td>\n",
       "      <td>The Bodyguard</td>\n",
       "      <td>1992</td>\n",
       "      <td>00:57:44</td>\n",
       "      <td>R&amp;B, soul, pop</td>\n",
       "      <td>27.4</td>\n",
       "      <td>44</td>\n",
       "      <td>1992-11-17</td>\n",
       "      <td>Y</td>\n",
       "      <td>8.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Meat Loaf</td>\n",
       "      <td>Bat Out of Hell</td>\n",
       "      <td>1977</td>\n",
       "      <td>00:46:33</td>\n",
       "      <td>hard rock, progressive rock</td>\n",
       "      <td>20.6</td>\n",
       "      <td>43</td>\n",
       "      <td>1977-10-21</td>\n",
       "      <td>NaN</td>\n",
       "      <td>8.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            Artist                      Album  Released    Length  \\\n",
       "0  Michael Jackson                   Thriller      1982  00:42:19   \n",
       "1            AC/DC              Back in Black      1980  00:42:11   \n",
       "2       Pink Floyd  The Dark Side of the Moon      1973  00:42:49   \n",
       "3  Whitney Houston              The Bodyguard      1992  00:57:44   \n",
       "4        Meat Loaf            Bat Out of Hell      1977  00:46:33   \n",
       "\n",
       "                         Genre  Music Recording Sales (millions)  \\\n",
       "0               pop, rock, R&B                              46.0   \n",
       "1                    hard rock                              26.1   \n",
       "2             progressive rock                              24.2   \n",
       "3               R&B, soul, pop                              27.4   \n",
       "4  hard rock, progressive rock                              20.6   \n",
       "\n",
       "   Claimed Sales (millions) Released.1 Soundtrack  Rating  \n",
       "0                        65 1982-11-30        NaN    10.0  \n",
       "1                        50 1980-07-25        NaN     9.5  \n",
       "2                        45 1973-03-01        NaN     9.0  \n",
       "3                        44 1992-11-17          Y     8.5  \n",
       "4                        43 1977-10-21        NaN     8.0  "
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "xlsx_path='https://ibm.box.com/shared/static/mzd4exo31la6m7neva2w45dstxfg5s86.xlsx'\n",
    "\n",
    "df = pd.read_excel(xlsx_path)\n",
    "df.head()\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can access the column \"Length\" and assign it a new dataframe 'x':\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Length</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>00:42:19</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>00:42:11</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>00:42:49</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>00:57:44</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>00:46:33</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>00:43:08</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>01:15:54</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>00:40:01</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     Length\n",
       "0  00:42:19\n",
       "1  00:42:11\n",
       "2  00:42:49\n",
       "3  00:57:44\n",
       "4  00:46:33\n",
       "5  00:43:08\n",
       "6  01:15:54\n",
       "7  00:40:01"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x=df[['Length']]\n",
    "x"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " The process is shown in the figure: "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " <img src = \"https://ibm.box.com/shared/static/bz800py5ui4w0kpb0k09lq3k5oegop5v.png\" width = 750, align = \"center\"></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " <a id=\"ref2\"></a>\n",
    "<h2 align=center> Viewing Data and Accessing Data </h2>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can also assign the value to a series, you can think of a Pandas series as a 1-D dataframe. Just use one bracket: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    00:42:19\n",
       "1    00:42:11\n",
       "2    00:42:49\n",
       "3    00:57:44\n",
       "4    00:46:33\n",
       "5    00:43:08\n",
       "6    01:15:54\n",
       "7    00:40:01\n",
       "Name: Length, dtype: object"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x=df['Length']\n",
    "x"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can also assign different columns, for example, we can assign the column 'Artist':"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Artist</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Michael Jackson</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>AC/DC</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Pink Floyd</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Whitney Houston</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Meat Loaf</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Eagles</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Bee Gees</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Fleetwood Mac</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            Artist\n",
       "0  Michael Jackson\n",
       "1            AC/DC\n",
       "2       Pink Floyd\n",
       "3  Whitney Houston\n",
       "4        Meat Loaf\n",
       "5           Eagles\n",
       "6         Bee Gees\n",
       "7    Fleetwood Mac"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x=df[['Artist']]\n",
    "x"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Assign the variable 'q' to the dataframe that is made up of the column 'Rating':\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Rating</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>10.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>9.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>9.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>8.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>8.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>7.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>7.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>6.5</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Rating\n",
       "0    10.0\n",
       "1     9.5\n",
       "2     9.0\n",
       "3     8.5\n",
       "4     8.0\n",
       "5     7.5\n",
       "6     7.0\n",
       "7     6.5"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "q=df[['Rating']]\n",
    "q"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " <div align=\"right\">\n",
    "<a href=\"#q1\" class=\"btn btn-default\" data-toggle=\"collapse\">Click here for the solution</a>\n",
    "\n",
    "</div>\n",
    "<div id=\"q1\" class=\"collapse\">\n",
    "```\n",
    "q=df[['Rating']]\n",
    "q\n",
    "```\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can do the same thing for multiple columns; we just put the dataframe name, in this case, **df**, and the name of the multiple column headers enclosed in double brackets. The result is a new dataframe comprised of the specified columns:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Artist</th>\n",
       "      <th>Length</th>\n",
       "      <th>Genre</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Michael Jackson</td>\n",
       "      <td>00:42:19</td>\n",
       "      <td>pop, rock, R&amp;B</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>AC/DC</td>\n",
       "      <td>00:42:11</td>\n",
       "      <td>hard rock</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Pink Floyd</td>\n",
       "      <td>00:42:49</td>\n",
       "      <td>progressive rock</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Whitney Houston</td>\n",
       "      <td>00:57:44</td>\n",
       "      <td>R&amp;B, soul, pop</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Meat Loaf</td>\n",
       "      <td>00:46:33</td>\n",
       "      <td>hard rock, progressive rock</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Eagles</td>\n",
       "      <td>00:43:08</td>\n",
       "      <td>rock, soft rock, folk rock</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Bee Gees</td>\n",
       "      <td>01:15:54</td>\n",
       "      <td>disco</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Fleetwood Mac</td>\n",
       "      <td>00:40:01</td>\n",
       "      <td>soft rock</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            Artist    Length                        Genre\n",
       "0  Michael Jackson  00:42:19               pop, rock, R&B\n",
       "1            AC/DC  00:42:11                    hard rock\n",
       "2       Pink Floyd  00:42:49             progressive rock\n",
       "3  Whitney Houston  00:57:44               R&B, soul, pop\n",
       "4        Meat Loaf  00:46:33  hard rock, progressive rock\n",
       "5           Eagles  00:43:08   rock, soft rock, folk rock\n",
       "6         Bee Gees  01:15:54                        disco\n",
       "7    Fleetwood Mac  00:40:01                    soft rock"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "y=df[['Artist','Length','Genre']]\n",
    "y"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The process is shown in the figure:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " <img src = \"https://ibm.box.com/shared/static/dh9duk3ucuhmmmbixa6ugac6g384m5sq.png\" width = 1100, align = \"center\"></a>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Album</th>\n",
       "      <th>Released</th>\n",
       "      <th>Length</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Thriller</td>\n",
       "      <td>1982</td>\n",
       "      <td>00:42:19</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Back in Black</td>\n",
       "      <td>1980</td>\n",
       "      <td>00:42:11</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>The Dark Side of the Moon</td>\n",
       "      <td>1973</td>\n",
       "      <td>00:42:49</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>The Bodyguard</td>\n",
       "      <td>1992</td>\n",
       "      <td>00:57:44</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Bat Out of Hell</td>\n",
       "      <td>1977</td>\n",
       "      <td>00:46:33</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Their Greatest Hits (1971-1975)</td>\n",
       "      <td>1976</td>\n",
       "      <td>00:43:08</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Saturday Night Fever</td>\n",
       "      <td>1977</td>\n",
       "      <td>01:15:54</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Rumours</td>\n",
       "      <td>1977</td>\n",
       "      <td>00:40:01</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                             Album  Released    Length\n",
       "0                         Thriller      1982  00:42:19\n",
       "1                    Back in Black      1980  00:42:11\n",
       "2        The Dark Side of the Moon      1973  00:42:49\n",
       "3                    The Bodyguard      1992  00:57:44\n",
       "4                  Bat Out of Hell      1977  00:46:33\n",
       "5  Their Greatest Hits (1971-1975)      1976  00:43:08\n",
       "6             Saturday Night Fever      1977  01:15:54\n",
       "7                          Rumours      1977  00:40:01"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[['Album','Released','Length']]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Assign the variable 'q' to the dataframe that is made up of the column 'Released' and 'Artist':\n",
    "q=df[['Released','Artist']]\n",
    "q"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " <div align=\"right\">\n",
    "<a href=\"#q2\" class=\"btn btn-default\" data-toggle=\"collapse\">Click here for the solution</a>\n",
    "\n",
    "</div>\n",
    "<div id=\"q2\" class=\"collapse\">\n",
    "```\n",
    "q=df[['Released','Artist']]\n",
    "q\n",
    "```\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "One way to access unique elements is the 'ix' method, where you can access the 1st row and first column as follows :"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Michael Jackson'"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#**ix** will be deprecated, use **iloc** for integer indexes \n",
    "#df.ix[0,0]\n",
    "df.iloc[0,0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can access the 2nd  row and first column as follows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'AC/DC'"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#**ix** will be deprecated, use **iloc** for integer indexes\n",
    "#df.ix[1,0]\n",
    "df.iloc[1,0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " You can access the 1st row 3rd column as follows: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1982"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#**ix** will be deprecated, use **iloc** for integer indexes\n",
    "#df.ix[0,2]\n",
    "df.iloc[0,2]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Access the 2nd row  3rd column:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1982"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.iloc[0,2]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " <div align=\"right\">\n",
    "<a href=\"#q3\" class=\"btn btn-default\" data-toggle=\"collapse\">Click here for the solution</a>\n",
    "\n",
    "</div>\n",
    "<div id=\"q3\" class=\"collapse\">\n",
    "```\n",
    "#**ix** will be deprecated use **iloc** for integer indexes\n",
    "#df.ix[1,2]\n",
    "df.iloc[0,2]\n",
    "```\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can access the column using the name as well, the following are the same as above: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Michael Jackson'"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#**ix** will be deprecated, use **loc** for label-location based indexer\n",
    "#df.ix[0,'Artist']\n",
    "df.loc[0,'Artist']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'AC/DC'"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#**ix** will be deprecated, use **loc** for label-location based indexer\n",
    "#df.ix[1,'Artist']\n",
    "df.loc[1,'Artist']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1982"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#**ix** will be deprecated, use **loc** for label-location based indexer\n",
    "#df.ix[0,'Released']\n",
    "df.loc[0,'Released']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1980"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#**ix** will be deprecated, use **loc** for label-location based indexer\n",
    "#df.ix[1,'Released']\n",
    "df.loc[1,'Released']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can perform slicing using both the index and the name of the column:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Artist</th>\n",
       "      <th>Album</th>\n",
       "      <th>Released</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Michael Jackson</td>\n",
       "      <td>Thriller</td>\n",
       "      <td>1982</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>AC/DC</td>\n",
       "      <td>Back in Black</td>\n",
       "      <td>1980</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            Artist          Album  Released\n",
       "0  Michael Jackson       Thriller      1982\n",
       "1            AC/DC  Back in Black      1980"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#**ix** will be deprecated, use **loc** for label-location based indexer\n",
    "#df.ix[0:2, 0:3]\n",
    "df.iloc[0:2, 0:3]\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "#**ix** will be deprecated, use **loc** for label-location based indexer\n",
    "#df.ix[0:2, 'Artist':'Released']\n",
    "df.loc[0:2, 'Artist':'Released']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " <a href=\"http://cocl.us/NotebooksPython101bottom\"><img src = \"https://ibm.box.com/shared/static/irypdxea2q4th88zu1o1tsd06dya10go.png\" width = 750, align = \"center\"></a>\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### About the Authors:  \n",
    "\n",
    " [Joseph Santarcangelo]( https://www.linkedin.com/in/joseph-s-50398b136/) has a PhD in Electrical Engineering, his research focused on using machine learning, signal processing, and computer vision to determine how videos impact human cognition. Joseph has been working for IBM since he completed his PhD.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Copyright &copy; 2017 [cognitiveclass.ai](https:cognitiveclass.ai). This notebook and its source code are released under the terms of the [MIT License](cognitiveclass.ai)."
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}