SasikiranJ/2.4_Sets.ipynb

## 2.4_Sets.ipynb
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " <a href=\"http://cocl.us/topNotebooksPython101Coursera\"><img src = \"https://ibm.box.com/shared/static/yfe6h4az47ktg2mm9h05wby2n7e8kei3.png\" width = 750, align = \"center\"></a>\n",
    "\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " <a href=\"https://www.bigdatauniversity.com\"><img src = \"https://ibm.box.com/shared/static/ugcqz6ohbvff804xp84y4kqnvvk3bq1g.png\" width = 300, align = \"center\"></a>\n",
    "\n",
    "<h1 align=center><font size = 5>Sets and Dictionaries</font></h1>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "## Table of Contents\n",
    "\n",
    "\n",
    "<div class=\"alert alert-block alert-info\" style=\"margin-top: 20px\">\n",
    "<li><a href=\"#ref1\">Sets</a></li>\n",
    "\n",
    "<br>\n",
    "<p></p>\n",
    "Estimated Time Needed: <strong>20 min</strong>\n",
    "</div>\n",
    "\n",
    "<hr>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"ref1\"></a>\n",
    "<center><h2>Sets</h2></center>\n",
    "\n",
    "In this lab, we are going to take a look at sets in Python. A set is a unique collection of objects in Python. You can denote a set with a curly bracket **{}**. Python will remove duplicate items:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'R&B', 'disco', 'hard rock', 'pop', 'rock', 'soul'}"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "set1={\"pop\", \"rock\", \"soul\", \"hard rock\", \"rock\", \"R&B\", \"rock\", \"disco\"}\n",
    "set1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The process of mapping is illustrated in the figure:\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a ><img src = https://ibm.box.com/shared/static/i0xb9qbetek7kbh17krx05i4lqmywahm.png width = 1100, align = \"center\"></a>\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " You can also  create a set from a list as follows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": false,
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{65,\n",
       " None,\n",
       " 'Thriller',\n",
       " 'Pop, Rock, R&B',\n",
       " 10.0,\n",
       " '00:42:19',\n",
       " 46.0,\n",
       " '30-Nov-82',\n",
       " 'Michael Jackson',\n",
       " 1982}"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "album_list =[ \"Michael Jackson\", \"Thriller\", 1982, \"00:42:19\", \\\n",
    "              \"Pop, Rock, R&B\", 46.0, 65, \"30-Nov-82\", None, 10.0]\n",
    "\n",
    "album_set = set(album_list)             \n",
    "album_set"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now let us create a set of  genres:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'R&B',\n",
       " 'disco',\n",
       " 'folk rock',\n",
       " 'hard rock',\n",
       " 'pop',\n",
       " 'progressive rock',\n",
       " 'rock',\n",
       " 'soft rock',\n",
       " 'soul'}"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "music_genres = set([\"pop\", \"pop\", \"rock\", \"folk rock\", \"hard rock\", \"soul\", \\\n",
    "                    \"progressive rock\", \"soft rock\", \"R&B\", \"disco\"])\n",
    "music_genres"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Convert the following list to a set ['rap','house','electronic music', 'rap']:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'electronic music', 'house', 'rap'}"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "set(['rap','house','electronic music','rap'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "  <div align=\"right\">\n",
    "<a href=\"#q10\" class=\"btn btn-default\" data-toggle=\"collapse\">Click here for the solution</a>\n",
    "\n",
    "</div>\n",
    "<div id=\"q10\" class=\"collapse\">\n",
    "```\n",
    "set(['rap','house','electronic music','rap'])\n",
    "```\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Notice that the duplicates are removed and the output is sorted."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let us get the sum of the claimed sales:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Consider the list A=[1,2,2,1] and set B=set([1,2,2,1]), does sum(A)=sum(B) "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#No, sum(A) != sum(B) Because if we do set operation duplicates will be removed."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": false,
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "the sum of A is: 6\n",
      "the sum of B is: 3\n"
     ]
    }
   ],
   "source": [
    "A=[1,2,2,1]  \n",
    "B=set([1,2,2,1])\n",
    "print(\"the sum of A is:\",sum(A))\n",
    "print(\"the sum of B is:\",sum(B))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " <div align=\"right\">\n",
    "<a href=\"#2\" class=\"btn btn-default\" data-toggle=\"collapse\">Click here for the solution</a>\n",
    "\n",
    "</div>\n",
    "<div id=\"2\" class=\"collapse\">\n",
    "```\n",
    "No, when casting a list to a set, the new set has no repeat elements. Run the following code to verify:\n",
    "A=[1,2,2,1]  \n",
    "B=set([1,2,2,1])\n",
    "print(\"the sum of A is:\",sum(A))\n",
    "print(\"the sum of B is:\",sum(B))\n",
    "```\n",
    "</div>\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now let's determine the average rating:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Set Operations "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " Let us go over Set Operations, as these can be used to change the set. Consider the set **A**:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'AC/DC', 'Back in Black', 'Thriller'}"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "A = set([\"Thriller\",\"Back in Black\", \"AC/DC\"] )\n",
    "A"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " We can add an element to a set using the **add()** method: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'AC/DC', 'Back in Black', 'NSYNC', 'Thriller'}"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "A.add(\"NSYNC\")\n",
    "A"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " If we add the same element twice, nothing will happen as there can be no duplicates in a set:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'AC/DC', 'Back in Black', 'NSYNC', 'Thriller'}"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "A.add(\"NSYNC\")\n",
    "A"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " We can remove an item from a set using the remove method:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'AC/DC', 'Back in Black', 'Thriller'}"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "A.remove(\"NSYNC\")\n",
    "A"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " We can verify if an element is in the set using the **in** command :"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "\"AC/DC\"  in A\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Working with sets"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Remember that with sets you can check the difference between sets, as well as the symmetric difference, intersection, and union:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " Consider the following two sets:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "album_set1 = set([\"Thriller\",'AC/DC', 'Back in Black'] )\n",
    "album_set2 = set([ \"AC/DC\",\"Back in Black\", \"The Dark Side of the Moon\"] )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " <a ><img src = \"https://ibm.box.com/shared/static/bl6ijga6g8r7bdfkl17qw7zh62czte47.png\" width = 850, align = \"center\"></a>\n",
    "  <h4 align=center>  Visualizing the sets as two circles \n",
    " \n",
    "  </h4> "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": false,
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "({'AC/DC', 'Back in Black', 'Thriller'},\n",
       " {'AC/DC', 'Back in Black', 'The Dark Side of the Moon'})"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "album_set1, album_set2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " As both sets contain 'AC/DC' and 'Back in Black' we represent these common elements with the intersection of two circles.    \n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " <a ><img src = \"https://ibm.box.com/shared/static/7ttuf8otui4s6axm23csmb4s3pxz16y2.png\" width = 650, align = \"center\"></a>\n",
    "  <h4 align=center>  Visualizing common elements with the intersection of two circles.\n",
    " \n",
    "  </h4> "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can find the common elements of the sets as follows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'AC/DC', 'Back in Black'}"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "album_set_3=album_set1 & album_set2\n",
    "album_set_3"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can find all the elements that are only contained in **album_set1** using the **difference** method:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'Thriller'}"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "album_set1.difference(album_set2)  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We only consider elements in **album_set1**; all the elements in **album_set2**, including the intersection, are not included.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " <a ><img src = \"https://ibm.box.com/shared/static/osmxw1qnb5t9odon2cx94wxhfzlkn1n8.png\" width = 650, align = \"center\"></a>\n",
    "  <h4 align=center>  The difference of “album_set1” and   “album_set2\n",
    " \n",
    "  </h4> "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The difference between **album_set2** and **album_set1** is given by:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'The Dark Side of the Moon'}"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "album_set2.difference(album_set1)  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a ><img src = \"https://ibm.box.com/shared/static/klgc09bgpsjudr9v3wtl8yk9s2lya3hl.png\" width = 650, align = \"center\"></a>\n",
    "  <h4 align=center>  The difference of **album_set2** and   **album_set1**\n",
    " \n",
    "  </h4> "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can also find the intersection, i.e in both **album_list2** and **album_list1**, using the intersection command :"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'AC/DC', 'Back in Black'}"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "album_set1.intersection(album_set2)   "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " This corresponds to the intersection of the two circles:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " <a ><img src = \"https://ibm.box.com/shared/static/s2xfytq43twp6jsvbvr4o2fir7wdablo.png\" width = 650, align = \"center\"></a>\n",
    "  <h4 align=center>  Intersection of set\n",
    " \n",
    "  </h4> "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " The union corresponds to all the elements in both sets, which is represented by colouring  both circles:\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " <a ><img src = \"https://ibm.box.com/shared/static/vkczce5jh50g0oh53xn0ilgriflcrog0.png\" width = 650, align = \"center\"></a>\n",
    "  <h4 align=center> Figure 7:  Union of set\n",
    " \n",
    "  </h4> "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " The union is given by:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'AC/DC', 'Back in Black', 'The Dark Side of the Moon', 'Thriller'}"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "album_set1.union(album_set2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And you can check if a set is a superset or subset of another set, respectively, like this:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "False"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "set(album_set1).issuperset(album_set2)   "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "collapsed": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "False"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "set(album_set2).issubset(album_set1)     "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here is an example where **issubset()** is **issuperset()** is true:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "collapsed": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "set({\"Back in Black\", \"AC/DC\"}).issubset(album_set1) "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "collapsed": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "album_set1.issuperset({\"Back in Black\", \"AC/DC\"})   "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Create a new set “album_set3” that is the union of “album_set1” and “album_set2”:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "collapsed": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'AC/DC', 'Back in Black', 'The Dark Side of the Moon', 'Thriller'}"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "album_set3=album_set1.union(album_set2) \n",
    "album_set3"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div align=\"right\">\n",
    "<a href=\"#4\" class=\"btn btn-default\" data-toggle=\"collapse\">Click here for the solution</a>\n",
    "\n",
    "</div>\n",
    "<div id=\"4\" class=\"collapse\">\n",
    "```\n",
    "album_set3=album_set1.union(album_set2)\n",
    "album_set3\n",
    "```\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Find out if  \"album_set1\" is a subset of \"album_set3\":"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " <div align=\"right\">\n",
    "<a href=\"#5\" class=\"btn btn-default\" data-toggle=\"collapse\">Click here for the solution</a>\n",
    "\n",
    "</div>\n",
    "<div id=\"5\" class=\"collapse\">\n",
    "```\n",
    "album_set1.issubset(album_set3)  \n",
    "\n",
    "```\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    " <a href=\"http://cocl.us/bottemNotebooksPython101Coursera\"><img src = \"https://ibm.box.com/shared/static/irypdxea2q4th88zu1o1tsd06dya10go.png\" width = 750, align = \"center\"></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "\n",
    "# About the Authors:  \n",
    "\n",
    " [Joseph Santarcangelo]( https://www.linkedin.com/in/joseph-s-50398b136/) has a PhD in Electrical Engineering, his research focused on using machine learning, signal processing, and computer vision to determine how videos impact human cognition. Joseph has been working for IBM since he completed his PhD.\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " <hr>\n",
    "Copyright &copy; 2017 [cognitiveclass.ai](cognitiveclass.ai?utm_source=bducopyrightlink&utm_medium=dswb&utm_campaign=bdu). This notebook and its source code are released under the terms of the [MIT License](https://bigdatauniversity.com/mit-license/)."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.6"
  },
  "widgets": {
   "state": {},
   "version": "1.1.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
	{
	"cells": [
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	" <a href=\"http://cocl.us/topNotebooksPython101Coursera\"><img src = \"https://ibm.box.com/shared/static/yfe6h4az47ktg2mm9h05wby2n7e8kei3.png\" width = 750, align = \"center\"></a>\n",
	"\n",
	"\n",
	"\n"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	" <a href=\"https://www.bigdatauniversity.com\"><img src = \"https://ibm.box.com/shared/static/ugcqz6ohbvff804xp84y4kqnvvk3bq1g.png\" width = 300, align = \"center\"></a>\n",
	"\n",
	"<h1 align=center><font size = 5>Sets and Dictionaries</font></h1>"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"\n",
	"## Table of Contents\n",
	"\n",
	"\n",
	"<div class=\"alert alert-block alert-info\" style=\"margin-top: 20px\">\n",
	"<li><a href=\"#ref1\">Sets</a></li>\n",
	"\n",
	"<br>\n",
	"<p></p>\n",
	"Estimated Time Needed: <strong>20 min</strong>\n",
	"</div>\n",
	"\n",
	"<hr>"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"<a id=\"ref1\"></a>\n",
	"<center><h2>Sets</h2></center>\n",
	"\n",
	"In this lab, we are going to take a look at sets in Python. A set is a unique collection of objects in Python. You can denote a set with a curly bracket {}. Python will remove duplicate items:\n"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 1,
	"metadata": {
	"collapsed": false
	},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"{'R&B', 'disco', 'hard rock', 'pop', 'rock', 'soul'}"
	]
	},
	"execution_count": 1,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"set1={\"pop\", \"rock\", \"soul\", \"hard rock\", \"rock\", \"R&B\", \"rock\", \"disco\"}\n",
	"set1"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"The process of mapping is illustrated in the figure:\n"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"<a ><img src = https://ibm.box.com/shared/static/i0xb9qbetek7kbh17krx05i4lqmywahm.png width = 1100, align = \"center\"></a>\n"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	" You can also create a set from a list as follows:"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 2,
	"metadata": {
	"collapsed": false,
	"scrolled": true
	},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"{65,\n",
	" None,\n",
	" 'Thriller',\n",
	" 'Pop, Rock, R&B',\n",
	" 10.0,\n",
	" '00:42:19',\n",
	" 46.0,\n",
	" '30-Nov-82',\n",
	" 'Michael Jackson',\n",
	" 1982}"
	]
	},
	"execution_count": 2,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"album_list =[ \"Michael Jackson\", \"Thriller\", 1982, \"00:42:19\", \\\n",
	" \"Pop, Rock, R&B\", 46.0, 65, \"30-Nov-82\", None, 10.0]\n",
	"\n",
	"album_set = set(album_list) \n",
	"album_set"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Now let us create a set of genres:"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 3,
	"metadata": {
	"collapsed": false
	},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"{'R&B',\n",
	" 'disco',\n",
	" 'folk rock',\n",
	" 'hard rock',\n",
	" 'pop',\n",
	" 'progressive rock',\n",
	" 'rock',\n",
	" 'soft rock',\n",
	" 'soul'}"
	]
	},
	"execution_count": 3,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"music_genres = set([\"pop\", \"pop\", \"rock\", \"folk rock\", \"hard rock\", \"soul\", \\\n",
	" \"progressive rock\", \"soft rock\", \"R&B\", \"disco\"])\n",
	"music_genres"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"#### Convert the following list to a set ['rap','house','electronic music', 'rap']:"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 4,
	"metadata": {
	"collapsed": false
	},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"{'electronic music', 'house', 'rap'}"
	]
	},
	"execution_count": 4,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"set(['rap','house','electronic music','rap'])"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	" <div align=\"right\">\n",
	"<a href=\"#q10\" class=\"btn btn-default\" data-toggle=\"collapse\">Click here for the solution</a>\n",
	"\n",
	"</div>\n",
	"<div id=\"q10\" class=\"collapse\">\n",
	"```\n",
	"set(['rap','house','electronic music','rap'])\n",
	"```\n",
	"</div>"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Notice that the duplicates are removed and the output is sorted."
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Let us get the sum of the claimed sales:"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"#### Consider the list A=[1,2,2,1] and set B=set([1,2,2,1]), does sum(A)=sum(B) "
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"#No, sum(A) != sum(B) Because if we do set operation duplicates will be removed."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 5,
	"metadata": {
	"collapsed": false,
	"scrolled": true
	},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"the sum of A is: 6\n",
	"the sum of B is: 3\n"
	]
	}
	],
	"source": [
	"A=[1,2,2,1] \n",
	"B=set([1,2,2,1])\n",
	"print(\"the sum of A is:\",sum(A))\n",
	"print(\"the sum of B is:\",sum(B))"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	" <div align=\"right\">\n",
	"<a href=\"#2\" class=\"btn btn-default\" data-toggle=\"collapse\">Click here for the solution</a>\n",
	"\n",
	"</div>\n",
	"<div id=\"2\" class=\"collapse\">\n",
	"```\n",
	"No, when casting a list to a set, the new set has no repeat elements. Run the following code to verify:\n",
	"A=[1,2,2,1] \n",
	"B=set([1,2,2,1])\n",
	"print(\"the sum of A is:\",sum(A))\n",
	"print(\"the sum of B is:\",sum(B))\n",
	"```\n",
	"</div>\n"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Now let's determine the average rating:"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"### Set Operations "
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	" Let us go over Set Operations, as these can be used to change the set. Consider the set A:"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 6,
	"metadata": {
	"collapsed": false
	},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"{'AC/DC', 'Back in Black', 'Thriller'}"
	]
	},
	"execution_count": 6,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"A = set([\"Thriller\",\"Back in Black\", \"AC/DC\"] )\n",
	"A"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	" We can add an element to a set using the add() method: "
	]
	},
	{
	"cell_type": "code",
	"execution_count": 7,
	"metadata": {
	"collapsed": false
	},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"{'AC/DC', 'Back in Black', 'NSYNC', 'Thriller'}"
	]
	},
	"execution_count": 7,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"A.add(\"NSYNC\")\n",
	"A"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	" If we add the same element twice, nothing will happen as there can be no duplicates in a set:\n"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 8,
	"metadata": {
	"collapsed": false
	},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"{'AC/DC', 'Back in Black', 'NSYNC', 'Thriller'}"
	]
	},
	"execution_count": 8,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"A.add(\"NSYNC\")\n",
	"A"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	" We can remove an item from a set using the remove method:"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 9,
	"metadata": {
	"collapsed": false
	},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"{'AC/DC', 'Back in Black', 'Thriller'}"
	]
	},
	"execution_count": 9,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"A.remove(\"NSYNC\")\n",
	"A"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	" We can verify if an element is in the set using the in command :"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 10,
	"metadata": {
	"collapsed": false
	},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"True"
	]
	},
	"execution_count": 10,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"\"AC/DC\" in A\n"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"### Working with sets"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Remember that with sets you can check the difference between sets, as well as the symmetric difference, intersection, and union:"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	" Consider the following two sets:"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 5,
	"metadata": {
	"collapsed": true
	},
	"outputs": [],
	"source": [
	"album_set1 = set([\"Thriller\",'AC/DC', 'Back in Black'] )\n",
	"album_set2 = set([ \"AC/DC\",\"Back in Black\", \"The Dark Side of the Moon\"] )"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	" <a ><img src = \"https://ibm.box.com/shared/static/bl6ijga6g8r7bdfkl17qw7zh62czte47.png\" width = 850, align = \"center\"></a>\n",
	" <h4 align=center> Visualizing the sets as two circles \n",
	" \n",
	" </h4> "
	]
	},
	{
	"cell_type": "code",
	"execution_count": 6,
	"metadata": {
	"collapsed": false,
	"scrolled": true
	},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"({'AC/DC', 'Back in Black', 'Thriller'},\n",
	" {'AC/DC', 'Back in Black', 'The Dark Side of the Moon'})"
	]
	},
	"execution_count": 6,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"album_set1, album_set2"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	" As both sets contain 'AC/DC' and 'Back in Black' we represent these common elements with the intersection of two circles. \n"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	" <a ><img src = \"https://ibm.box.com/shared/static/7ttuf8otui4s6axm23csmb4s3pxz16y2.png\" width = 650, align = \"center\"></a>\n",
	" <h4 align=center> Visualizing common elements with the intersection of two circles.\n",
	" \n",
	" </h4> "
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"We can find the common elements of the sets as follows:"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 7,
	"metadata": {
	"collapsed": false
	},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"{'AC/DC', 'Back in Black'}"
	]
	},
	"execution_count": 7,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"album_set_3=album_set1 & album_set2\n",
	"album_set_3"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"We can find all the elements that are only contained in album_set1 using the difference method:"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 8,
	"metadata": {
	"collapsed": false
	},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"{'Thriller'}"
	]
	},
	"execution_count": 8,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"album_set1.difference(album_set2) "
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"We only consider elements in album_set1; all the elements in album_set2, including the intersection, are not included.\n"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	" <a ><img src = \"https://ibm.box.com/shared/static/osmxw1qnb5t9odon2cx94wxhfzlkn1n8.png\" width = 650, align = \"center\"></a>\n",
	" <h4 align=center> The difference of “album_set1” and “album_set2\n",
	" \n",
	" </h4> "
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"The difference between album_set2 and album_set1 is given by:"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 9,
	"metadata": {
	"collapsed": false
	},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"{'The Dark Side of the Moon'}"
	]
	},
	"execution_count": 9,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"album_set2.difference(album_set1) "
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"<a ><img src = \"https://ibm.box.com/shared/static/klgc09bgpsjudr9v3wtl8yk9s2lya3hl.png\" width = 650, align = \"center\"></a>\n",
	" <h4 align=center> The difference of album_set2 and album_set1\n",
	" \n",
	" </h4> "
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"We can also find the intersection, i.e in both album_list2 and album_list1, using the intersection command :"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 10,
	"metadata": {
	"collapsed": false
	},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"{'AC/DC', 'Back in Black'}"
	]
	},
	"execution_count": 10,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"album_set1.intersection(album_set2) "
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	" This corresponds to the intersection of the two circles:"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	" <a ><img src = \"https://ibm.box.com/shared/static/s2xfytq43twp6jsvbvr4o2fir7wdablo.png\" width = 650, align = \"center\"></a>\n",
	" <h4 align=center> Intersection of set\n",
	" \n",
	" </h4> "
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	" The union corresponds to all the elements in both sets, which is represented by colouring both circles:\n"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	" <a ><img src = \"https://ibm.box.com/shared/static/vkczce5jh50g0oh53xn0ilgriflcrog0.png\" width = 650, align = \"center\"></a>\n",
	" <h4 align=center> Figure 7: Union of set\n",
	" \n",
	" </h4> "
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	" The union is given by:"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 11,
	"metadata": {
	"collapsed": false
	},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"{'AC/DC', 'Back in Black', 'The Dark Side of the Moon', 'Thriller'}"
	]
	},
	"execution_count": 11,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"album_set1.union(album_set2)"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"And you can check if a set is a superset or subset of another set, respectively, like this:"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 12,
	"metadata": {
	"collapsed": false
	},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"False"
	]
	},
	"execution_count": 12,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"set(album_set1).issuperset(album_set2) "
	]
	},
	{
	"cell_type": "code",
	"execution_count": 13,
	"metadata": {
	"collapsed": true
	},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"False"
	]
	},
	"execution_count": 13,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"set(album_set2).issubset(album_set1) "
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Here is an example where issubset() is issuperset() is true:"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 14,
	"metadata": {
	"collapsed": true
	},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"True"
	]
	},
	"execution_count": 14,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"set({\"Back in Black\", \"AC/DC\"}).issubset(album_set1) "
	]
	},
	{
	"cell_type": "code",
	"execution_count": 15,
	"metadata": {
	"collapsed": true
	},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"True"
	]
	},
	"execution_count": 15,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"album_set1.issuperset({\"Back in Black\", \"AC/DC\"}) "
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"#### Create a new set “album_set3” that is the union of “album_set1” and “album_set2”:"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 16,
	"metadata": {
	"collapsed": true
	},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"{'AC/DC', 'Back in Black', 'The Dark Side of the Moon', 'Thriller'}"
	]
	},
	"execution_count": 16,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"album_set3=album_set1.union(album_set2) \n",
	"album_set3"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"<div align=\"right\">\n",
	"<a href=\"#4\" class=\"btn btn-default\" data-toggle=\"collapse\">Click here for the solution</a>\n",
	"\n",
	"</div>\n",
	"<div id=\"4\" class=\"collapse\">\n",
	"```\n",
	"album_set3=album_set1.union(album_set2)\n",
	"album_set3\n",
	"```\n",
	"</div>"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"#### Find out if \"album_set1\" is a subset of \"album_set3\":"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {
	"collapsed": true
	},
	"outputs": [],
	"source": []
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	" <div align=\"right\">\n",
	"<a href=\"#5\" class=\"btn btn-default\" data-toggle=\"collapse\">Click here for the solution</a>\n",
	"\n",
	"</div>\n",
	"<div id=\"5\" class=\"collapse\">\n",
	"```\n",
	"album_set1.issubset(album_set3) \n",
	"\n",
	"```\n",
	"</div>"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"\n",
	" <a href=\"http://cocl.us/bottemNotebooksPython101Coursera\"><img src = \"https://ibm.box.com/shared/static/irypdxea2q4th88zu1o1tsd06dya10go.png\" width = 750, align = \"center\"></a>"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"\n",
	"\n",
	"# About the Authors: \n",
	"\n",
	" [Joseph Santarcangelo]( https://www.linkedin.com/in/joseph-s-50398b136/) has a PhD in Electrical Engineering, his research focused on using machine learning, signal processing, and computer vision to determine how videos impact human cognition. Joseph has been working for IBM since he completed his PhD.\n",
	"\n"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	" <hr>\n",
	"Copyright © 2017 [cognitiveclass.ai](cognitiveclass.ai?utm_source=bducopyrightlink&utm_medium=dswb&utm_campaign=bdu). This notebook and its source code are released under the terms of the [MIT License](https://bigdatauniversity.com/mit-license/)."
	]
	}
	],
	"metadata": {
	"kernelspec": {
	"display_name": "Python 3",
	"language": "python",
	"name": "python3"
	},
	"language_info": {
	"codemirror_mode": {
	"name": "ipython",
	"version": 3
	},
	"file_extension": ".py",
	"mimetype": "text/x-python",
	"name": "python",
	"nbconvert_exporter": "python",
	"pygments_lexer": "ipython3",
	"version": "3.6.6"
	},
	"widgets": {
	"state": {},
	"version": "1.1.2"
	}
	},
	"nbformat": 4,
	"nbformat_minor": 2
	}