Skip to content

Instantly share code, notes, and snippets.

@SasikiranJ
Created January 8, 2019 16:34
Show Gist options
  • Save SasikiranJ/787190c7370feeed01f02541e15574f4 to your computer and use it in GitHub Desktop.
Save SasikiranJ/787190c7370feeed01f02541e15574f4 to your computer and use it in GitHub Desktop.
Created on Cognitive Class Labs
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
" <a href=\"http://cocl.us/topNotebooksPython101Coursera\"><img src = \"https://ibm.box.com/shared/static/yfe6h4az47ktg2mm9h05wby2n7e8kei3.png\" width = 750, align = \"center\"></a>\n",
"\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" <a href=\"https://www.bigdatauniversity.com\"><img src = \"https://ibm.box.com/shared/static/ugcqz6ohbvff804xp84y4kqnvvk3bq1g.png\" width = 300, align = \"center\"></a>\n",
"\n",
"<h1 align=center><font size = 5>Sets and Dictionaries</font></h1>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"## Table of Contents\n",
"\n",
"\n",
"<div class=\"alert alert-block alert-info\" style=\"margin-top: 20px\">\n",
"<li><a href=\"#ref1\">Sets</a></li>\n",
"\n",
"<br>\n",
"<p></p>\n",
"Estimated Time Needed: <strong>20 min</strong>\n",
"</div>\n",
"\n",
"<hr>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"ref1\"></a>\n",
"<center><h2>Sets</h2></center>\n",
"\n",
"In this lab, we are going to take a look at sets in Python. A set is a unique collection of objects in Python. You can denote a set with a curly bracket **{}**. Python will remove duplicate items:\n"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"{'R&B', 'disco', 'hard rock', 'pop', 'rock', 'soul'}"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"set1={\"pop\", \"rock\", \"soul\", \"hard rock\", \"rock\", \"R&B\", \"rock\", \"disco\"}\n",
"set1"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The process of mapping is illustrated in the figure:\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a ><img src = https://ibm.box.com/shared/static/i0xb9qbetek7kbh17krx05i4lqmywahm.png width = 1100, align = \"center\"></a>\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" You can also create a set from a list as follows:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"{65,\n",
" None,\n",
" 'Thriller',\n",
" 'Pop, Rock, R&B',\n",
" 10.0,\n",
" '00:42:19',\n",
" 46.0,\n",
" '30-Nov-82',\n",
" 'Michael Jackson',\n",
" 1982}"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"album_list =[ \"Michael Jackson\", \"Thriller\", 1982, \"00:42:19\", \\\n",
" \"Pop, Rock, R&B\", 46.0, 65, \"30-Nov-82\", None, 10.0]\n",
"\n",
"album_set = set(album_list) \n",
"album_set"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now let us create a set of genres:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"{'R&B',\n",
" 'disco',\n",
" 'folk rock',\n",
" 'hard rock',\n",
" 'pop',\n",
" 'progressive rock',\n",
" 'rock',\n",
" 'soft rock',\n",
" 'soul'}"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"music_genres = set([\"pop\", \"pop\", \"rock\", \"folk rock\", \"hard rock\", \"soul\", \\\n",
" \"progressive rock\", \"soft rock\", \"R&B\", \"disco\"])\n",
"music_genres"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Convert the following list to a set ['rap','house','electronic music', 'rap']:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"{'electronic music', 'house', 'rap'}"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"set(['rap','house','electronic music','rap'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" <div align=\"right\">\n",
"<a href=\"#q10\" class=\"btn btn-default\" data-toggle=\"collapse\">Click here for the solution</a>\n",
"\n",
"</div>\n",
"<div id=\"q10\" class=\"collapse\">\n",
"```\n",
"set(['rap','house','electronic music','rap'])\n",
"```\n",
"</div>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Notice that the duplicates are removed and the output is sorted."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let us get the sum of the claimed sales:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Consider the list A=[1,2,2,1] and set B=set([1,2,2,1]), does sum(A)=sum(B) "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#No, sum(A) != sum(B) Because if we do set operation duplicates will be removed."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"the sum of A is: 6\n",
"the sum of B is: 3\n"
]
}
],
"source": [
"A=[1,2,2,1] \n",
"B=set([1,2,2,1])\n",
"print(\"the sum of A is:\",sum(A))\n",
"print(\"the sum of B is:\",sum(B))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" <div align=\"right\">\n",
"<a href=\"#2\" class=\"btn btn-default\" data-toggle=\"collapse\">Click here for the solution</a>\n",
"\n",
"</div>\n",
"<div id=\"2\" class=\"collapse\">\n",
"```\n",
"No, when casting a list to a set, the new set has no repeat elements. Run the following code to verify:\n",
"A=[1,2,2,1] \n",
"B=set([1,2,2,1])\n",
"print(\"the sum of A is:\",sum(A))\n",
"print(\"the sum of B is:\",sum(B))\n",
"```\n",
"</div>\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now let's determine the average rating:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Set Operations "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" Let us go over Set Operations, as these can be used to change the set. Consider the set **A**:"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"{'AC/DC', 'Back in Black', 'Thriller'}"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"A = set([\"Thriller\",\"Back in Black\", \"AC/DC\"] )\n",
"A"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" We can add an element to a set using the **add()** method: "
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"{'AC/DC', 'Back in Black', 'NSYNC', 'Thriller'}"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"A.add(\"NSYNC\")\n",
"A"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" If we add the same element twice, nothing will happen as there can be no duplicates in a set:\n"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"{'AC/DC', 'Back in Black', 'NSYNC', 'Thriller'}"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"A.add(\"NSYNC\")\n",
"A"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" We can remove an item from a set using the remove method:"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"{'AC/DC', 'Back in Black', 'Thriller'}"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"A.remove(\"NSYNC\")\n",
"A"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" We can verify if an element is in the set using the **in** command :"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"\"AC/DC\" in A\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Working with sets"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Remember that with sets you can check the difference between sets, as well as the symmetric difference, intersection, and union:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" Consider the following two sets:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"album_set1 = set([\"Thriller\",'AC/DC', 'Back in Black'] )\n",
"album_set2 = set([ \"AC/DC\",\"Back in Black\", \"The Dark Side of the Moon\"] )"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" <a ><img src = \"https://ibm.box.com/shared/static/bl6ijga6g8r7bdfkl17qw7zh62czte47.png\" width = 850, align = \"center\"></a>\n",
" <h4 align=center> Visualizing the sets as two circles \n",
" \n",
" </h4> "
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"({'AC/DC', 'Back in Black', 'Thriller'},\n",
" {'AC/DC', 'Back in Black', 'The Dark Side of the Moon'})"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"album_set1, album_set2"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" As both sets contain 'AC/DC' and 'Back in Black' we represent these common elements with the intersection of two circles. \n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" <a ><img src = \"https://ibm.box.com/shared/static/7ttuf8otui4s6axm23csmb4s3pxz16y2.png\" width = 650, align = \"center\"></a>\n",
" <h4 align=center> Visualizing common elements with the intersection of two circles.\n",
" \n",
" </h4> "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can find the common elements of the sets as follows:"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"{'AC/DC', 'Back in Black'}"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"album_set_3=album_set1 & album_set2\n",
"album_set_3"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can find all the elements that are only contained in **album_set1** using the **difference** method:"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"{'Thriller'}"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"album_set1.difference(album_set2) "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We only consider elements in **album_set1**; all the elements in **album_set2**, including the intersection, are not included.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" <a ><img src = \"https://ibm.box.com/shared/static/osmxw1qnb5t9odon2cx94wxhfzlkn1n8.png\" width = 650, align = \"center\"></a>\n",
" <h4 align=center> The difference of “album_set1” and “album_set2\n",
" \n",
" </h4> "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The difference between **album_set2** and **album_set1** is given by:"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"{'The Dark Side of the Moon'}"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"album_set2.difference(album_set1) "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a ><img src = \"https://ibm.box.com/shared/static/klgc09bgpsjudr9v3wtl8yk9s2lya3hl.png\" width = 650, align = \"center\"></a>\n",
" <h4 align=center> The difference of **album_set2** and **album_set1**\n",
" \n",
" </h4> "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can also find the intersection, i.e in both **album_list2** and **album_list1**, using the intersection command :"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"{'AC/DC', 'Back in Black'}"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"album_set1.intersection(album_set2) "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" This corresponds to the intersection of the two circles:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" <a ><img src = \"https://ibm.box.com/shared/static/s2xfytq43twp6jsvbvr4o2fir7wdablo.png\" width = 650, align = \"center\"></a>\n",
" <h4 align=center> Intersection of set\n",
" \n",
" </h4> "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" The union corresponds to all the elements in both sets, which is represented by colouring both circles:\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" <a ><img src = \"https://ibm.box.com/shared/static/vkczce5jh50g0oh53xn0ilgriflcrog0.png\" width = 650, align = \"center\"></a>\n",
" <h4 align=center> Figure 7: Union of set\n",
" \n",
" </h4> "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" The union is given by:"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"{'AC/DC', 'Back in Black', 'The Dark Side of the Moon', 'Thriller'}"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"album_set1.union(album_set2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And you can check if a set is a superset or subset of another set, respectively, like this:"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"False"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"set(album_set1).issuperset(album_set2) "
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"collapsed": true
},
"outputs": [
{
"data": {
"text/plain": [
"False"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"set(album_set2).issubset(album_set1) "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here is an example where **issubset()** is **issuperset()** is true:"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"collapsed": true
},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"set({\"Back in Black\", \"AC/DC\"}).issubset(album_set1) "
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"collapsed": true
},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"album_set1.issuperset({\"Back in Black\", \"AC/DC\"}) "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Create a new set “album_set3” that is the union of “album_set1” and “album_set2”:"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"collapsed": true
},
"outputs": [
{
"data": {
"text/plain": [
"{'AC/DC', 'Back in Black', 'The Dark Side of the Moon', 'Thriller'}"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"album_set3=album_set1.union(album_set2) \n",
"album_set3"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<div align=\"right\">\n",
"<a href=\"#4\" class=\"btn btn-default\" data-toggle=\"collapse\">Click here for the solution</a>\n",
"\n",
"</div>\n",
"<div id=\"4\" class=\"collapse\">\n",
"```\n",
"album_set3=album_set1.union(album_set2)\n",
"album_set3\n",
"```\n",
"</div>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Find out if \"album_set1\" is a subset of \"album_set3\":"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" <div align=\"right\">\n",
"<a href=\"#5\" class=\"btn btn-default\" data-toggle=\"collapse\">Click here for the solution</a>\n",
"\n",
"</div>\n",
"<div id=\"5\" class=\"collapse\">\n",
"```\n",
"album_set1.issubset(album_set3) \n",
"\n",
"```\n",
"</div>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
" <a href=\"http://cocl.us/bottemNotebooksPython101Coursera\"><img src = \"https://ibm.box.com/shared/static/irypdxea2q4th88zu1o1tsd06dya10go.png\" width = 750, align = \"center\"></a>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"\n",
"# About the Authors: \n",
"\n",
" [Joseph Santarcangelo]( https://www.linkedin.com/in/joseph-s-50398b136/) has a PhD in Electrical Engineering, his research focused on using machine learning, signal processing, and computer vision to determine how videos impact human cognition. Joseph has been working for IBM since he completed his PhD.\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" <hr>\n",
"Copyright &copy; 2017 [cognitiveclass.ai](cognitiveclass.ai?utm_source=bducopyrightlink&utm_medium=dswb&utm_campaign=bdu). This notebook and its source code are released under the terms of the [MIT License](https://bigdatauniversity.com/mit-license/)."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.6"
},
"widgets": {
"state": {},
"version": "1.1.2"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment