Navigation Menu

Skip to content

Instantly share code, notes, and snippets.

@manujeevanprakash
Created October 31, 2014 14:33
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
  • Save manujeevanprakash/e97defef743c4ddefcb1 to your computer and use it in GitHub Desktop.
Save manujeevanprakash/e97defef743c4ddefcb1 to your computer and use it in GitHub Desktop.
Python Essentials
{
"metadata": {
"name": "",
"signature": "sha256:0be9d6cb6d2b0c625f7ad7149ac2bf31839f2d6b0dc4d781284356478b52e7b1"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Before going through this tutorial. Work out the basic python programming exercises on [code academy](http://www.codecademy.com/en/tracks/python). "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Python places an emphasis on readablity, simplicity and explicitiness.\n",
"\n",
"Every thing is an object in python. Every number, string, data structure, class are referred to as python objects."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can use comments to summarize a code. See the below example for comments. \n",
"For printing a statement you can use 'print' command. Strings should be included in double quotes.\n"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"print \"Big data examiner\" #Big data examiner is a one stop place to learn datascience. "
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Big data examiner\n"
]
}
],
"prompt_number": 39
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can return type of an object using type command. You can check whether an object is an instance of a particular type\n",
"using <em><b>isinstance</em> function."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"a= 'Big data'\n",
"print type(a)\n",
"\n",
"b= 'Examiner'\n",
"print type(b)\n",
"\n",
"c= 4.5 \n",
"\n",
"print isinstance(a, str)\n",
"print isinstance(a,int)\n",
"print isinstance(c, (int, float))"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"<type 'str'>\n",
"<type 'str'>\n",
"True\n",
"False\n",
"True\n"
]
}
],
"prompt_number": 9
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Objects and attributes of a python object can be accessed using <em> <b>object.attribute_name</em>."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"a = 'Bill gates'\n",
"a.<tab> # remove <tab> and press tab button"
],
"language": "python",
"metadata": {},
"outputs": [
{
"ename": "SyntaxError",
"evalue": "invalid syntax (<ipython-input-15-94d2f58585b1>, line 2)",
"output_type": "pyerr",
"traceback": [
"\u001b[1;36m File \u001b[1;32m\"<ipython-input-15-94d2f58585b1>\"\u001b[1;36m, line \u001b[1;32m2\u001b[0m\n\u001b[1;33m a.<tab> # remove <tab> and press tab button\u001b[0m\n\u001b[1;37m ^\u001b[0m\n\u001b[1;31mSyntaxError\u001b[0m\u001b[1;31m:\u001b[0m invalid syntax\n"
]
}
],
"prompt_number": 15
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can import a Python [module](https://docs.python.org/2/tutorial/modules.html) using import command.\n"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import numpy as np # importing numpy as np\n",
"data_new = [6, 7.5, 8, 0, 1]\n",
"data = np.array(data1) # accessing numpy as np. Here I am converting a list to array\n",
"data"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 19,
"text": [
"array([ 6. , 7.5, 8. , 0. , 1. ])"
]
}
],
"prompt_number": 19
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"try these functions, these are self explanatory"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"x= [1,2,3,4]\n",
"y = x \n",
"z=list(x)\n",
"print x is y\n",
"print x is not z\n",
"\n",
"# you can use the following operators:\n",
"# x // y -> this is called floor divide, it drops the fractional remainder\n",
"# x** y -> raise x to the y the power.\n",
"# x< =y, x<y -> True if y is less than or equal to y. Same implies with greater than symbol.\n",
"# same applies to other logical operators such as &, |, ^, ==, !="
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"True\n",
"True\n"
]
}
],
"prompt_number": 27
},
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": [
"<b>Mutable and immutable objects </b> "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Objects whose value can be changed, once they are created are called mutable objects.\n",
"Objects whose value cannot be changed, once they are created are called immutable objects "
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# list, dict, arrays are a mutable\n",
"programming = ['Python', 'R', 'Java', 'Php']\n",
"programming[2] ='c++'\n",
"print programming\n",
"\n",
"#Strings and tuples are immutable\n",
"z_tuple = (9, 10, 11, 23)\n",
"z_tuple[1] = 'twenty two' # you cant mutate a tuple\n"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"['Python', 'R', 'c++', 'Php']\n"
]
},
{
"ename": "TypeError",
"evalue": "'tuple' object does not support item assignment",
"output_type": "pyerr",
"traceback": [
"\u001b[1;31m---------------------------------------------------------------------------\u001b[0m\n\u001b[1;31mTypeError\u001b[0m Traceback (most recent call last)",
"\u001b[1;32m<ipython-input-33-1282c7c7a358>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m()\u001b[0m\n\u001b[0;32m 6\u001b[0m \u001b[1;31m#Strings and tuples are immutable\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 7\u001b[0m \u001b[0mz_tuple\u001b[0m \u001b[1;33m=\u001b[0m \u001b[1;33m(\u001b[0m\u001b[1;36m9\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;36m10\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;36m11\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;36m23\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m----> 8\u001b[1;33m \u001b[0mz_tuple\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;36m1\u001b[0m\u001b[1;33m]\u001b[0m \u001b[1;33m=\u001b[0m \u001b[1;34m'twenty two'\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[1;31mTypeError\u001b[0m: 'tuple' object does not support item assignment"
]
}
],
"prompt_number": 33
},
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": [
"<b> Strings</b>"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# you can write multiline strings using triple quotes ''' or \"\"\"\n",
"\"\"\"\n",
"Hi! learn Python it is fun \n",
"Data science and machine learning are amazing\n",
"\"\"\""
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 10,
"text": [
"'\\nHi! learn Python it is fun \\nData science and machine learning are amazing\\n'"
]
}
],
"prompt_number": 10
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# As I said before python strings are immutable.\n",
"x= ' This is big data examiner'\n",
"x[10] = 'f'"
],
"language": "python",
"metadata": {},
"outputs": [
{
"ename": "TypeError",
"evalue": "'str' object does not support item assignment",
"output_type": "pyerr",
"traceback": [
"\u001b[1;31m---------------------------------------------------------------------------\u001b[0m\n\u001b[1;31mTypeError\u001b[0m Traceback (most recent call last)",
"\u001b[1;32m<ipython-input-43-033ea51cd601>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m()\u001b[0m\n\u001b[0;32m 1\u001b[0m \u001b[1;31m# As I said before python strings are immutable.\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 2\u001b[0m \u001b[0mx\u001b[0m\u001b[1;33m=\u001b[0m \u001b[1;34m' This is big data examiner'\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m----> 3\u001b[1;33m \u001b[0mx\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;36m10\u001b[0m\u001b[1;33m]\u001b[0m \u001b[1;33m=\u001b[0m \u001b[1;34m'f'\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[1;31mTypeError\u001b[0m: 'str' object does not support item assignment"
]
}
],
"prompt_number": 43
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"x = 'Java is a powerful programming language'\n",
"y = x.replace('Java', 'Python')\n",
"y"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 46,
"text": [
"'Python is a powerful programming language'"
]
}
],
"prompt_number": 46
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# many python objects can be converted to a string using 'str' function\n",
"x = 56664\n",
"y = str(x)\n",
"print y\n",
"print type(y)\n",
"# strings act like other sequences, such as lists and tuples\n",
"a = 'Python'\n",
"print list(a)\n",
"print a[:3] # you can slice a python string \n",
"print a[3:]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"56664\n",
"<type 'str'>\n",
"['P', 'y', 't', 'h', 'o', 'n']\n",
"Pyt\n",
"hon\n"
]
}
],
"prompt_number": 10
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#String concentation is very important\n",
"p = \"P is the best programming language\"\n",
"q = \", I have ever seen\"\n",
"z = p+q\n",
"z"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 18,
"text": [
"'P is the best programming language, I have ever seen'"
]
}
],
"prompt_number": 18
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You have to do lot of string formatting while doing data analysis. You can format an argument as a string using %s, %d for an integer, %.3f for a number with 3 decimal points"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"print \"Hii space left is just %.3f gb, and the data base is %s\" %(0.987, 'mysql')\n",
"print \"Hii space left is just %f gb, and the data base is %s\" %(0.987, 'mysql')\n",
"print \"Hii space left is just %d gb, and the data base is %s\" %(0.987, 'mysql')\n",
"\n"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Hii space left is just 0.987 gb, and the data base is mysql\n",
"Hii space left is just 0.987000 gb, and the data base is mysql\n",
"Hii space left is just 0 gb, and the data base is mysql\n"
]
}
],
"prompt_number": 17
},
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": [
"Boolean and date-time "
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# boolean values in python are written as True and False.\n",
"print True and True\n",
"print True or False\n",
"print True and False"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"True\n",
"True\n",
"False\n"
]
}
],
"prompt_number": 25
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#Empty iterables(list, dict, strings, tuples etc) are treated as False ,if used with a control flow(if, for ,while.. etc)\n",
"print bool([]), bool([1,2,3])\n",
"print bool('Hello Python!'), bool('')\n",
"bool(0), bool(1)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"False True\n",
"True False\n"
]
},
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 12,
"text": [
"(False, True)"
]
}
],
"prompt_number": 12
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"x = '1729'\n",
"y = float(x)\n",
"print type(y)\n",
"print int(y)\n",
"print bool(y)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"<type 'float'>\n",
"1729\n",
"True\n"
]
}
],
"prompt_number": 34
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#Python date and time module provides datetime, date and time types\n",
"from datetime import datetime, date, time\n",
"td = datetime(1989,6,9,5,1, 30)# do not write number 6 as 06, you will get an invalid token error.\n",
"print td.day\n",
"print td.minute\n",
"print td.date()\n",
"print td.time()\n",
"td.strftime('%m/%d/%y %H:%M:%S')#strf method converts the date and time into a string"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"9\n",
"1\n",
"1989-06-09\n",
"05:01:30\n"
]
},
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 20,
"text": [
"'06/09/89 05:01:30'"
]
}
],
"prompt_number": 20
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"from datetime import datetime, date, time\n",
"datetime.strptime('1989911', '%Y%m%d') # strings can be converted to date and time objects using strptime\n",
"td = datetime(1989,6,9,5,1, 30)\n",
"td.replace(hour =0 ,minute=0, second=30)#you can replace function to edit datetim function"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 33,
"text": [
"datetime.datetime(1989, 6, 9, 0, 0, 30)"
]
}
],
"prompt_number": 33
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"from datetime import datetime, date, time\n",
"td = datetime(1989,6,9,5,1, 30)\n",
"td1 = datetime(1988,8, 31, 11, 2, 23)\n",
"new_time =td1 - td # you can subtract two different date and time functions\n",
"print new_time \n",
"print type(new_time) # the type is date and time\n",
"print td +new_time"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"-282 days, 6:00:53\n",
"<type 'datetime.timedelta'>\n",
"1988-08-31 11:02:23\n"
]
}
],
"prompt_number": 43
},
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": [
"<b>Handling Exceptions"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Handling Exceptions is only a fancy name for <em>handling python errors<em>. In Python many functions work ony on certain type of input. For example, float function returns a value error, when you feed it with a string."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"print float('7.968')\n",
"float('Big data')"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"7.968\n"
]
},
{
"ename": "ValueError",
"evalue": "could not convert string to float: Big data",
"output_type": "pyerr",
"traceback": [
"\u001b[1;31m---------------------------------------------------------------------------\u001b[0m\n\u001b[1;31mValueError\u001b[0m Traceback (most recent call last)",
"\u001b[1;32m<ipython-input-8-e679c5a97125>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m()\u001b[0m\n\u001b[0;32m 1\u001b[0m \u001b[1;32mprint\u001b[0m \u001b[0mfloat\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;34m'7.968'\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m----> 2\u001b[1;33m \u001b[0mfloat\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;34m'Big data'\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[1;31mValueError\u001b[0m: could not convert string to float: Big data"
]
}
],
"prompt_number": 8
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# suppose we want our float function to return the input value, we can do this using the folowing code.\n",
"def return_float(x):\n",
" try:\n",
" return float(x)\n",
" except:\n",
" return x\n",
"\n",
"print return_float('4.55')\n",
"print return_float('big data') # This time it didnt return a value error"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"4.55\n",
"big data\n"
]
}
],
"prompt_number": 15
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#print float((9,8)) ->this will return a type error, remove the comment and check the output.\n",
"def return_float(x):\n",
" try:\n",
" return float(x)\n",
" except(TypeError, ValueError):# type error and value error are mentioned as a exception values\n",
" return x\n",
"print return_float((9,8)) #now you can see it returns 9,8"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"(9, 8)\n"
]
}
],
"prompt_number": 13
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# these are called ternary expressions\n",
"x = 'Life is short use python'\n",
"'This is my favourite quote' if x == 'Life is short use python' else 'I hate R'"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 29,
"text": [
"'This is my favourite quote'"
]
}
],
"prompt_number": 29
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Go through loops in Python(if, for and while). Refer Codeacademy"
]
},
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": [
"<b> Tuples"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#Tuples are one dimensional, fixed length, imutable sequence of Python Objects.\n",
"machine_learning = 77, 45, 67\n",
"print machine_learning\n",
"pythonista = (87, 56, 98), (78, 45, 33) #Nested Tuples\n",
"print pythonista"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"(77, 45, 67)\n",
"((87, 56, 98), (78, 45, 33))\n"
]
}
],
"prompt_number": 7
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#You can convert any sequence to a tuple by using 'tuple' keyword\n",
"print tuple([4,0,2])\n",
"pythonista = tuple('Python')\n",
"print pythonista\n",
"pythonista[0] # you can accessing each element in a tuple, "
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"(4, 0, 2)\n",
"('P', 'y', 't', 'h', 'o', 'n')\n"
]
},
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 14,
"text": [
"'P'"
]
}
],
"prompt_number": 14
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"x = tuple(['Manu',[99,88], 'Jeevan'])\n",
"#x[2] = 'Prakash' # you cant modify a tuple like this\n",
"x[1].append(77)# But you can append to a object to a tuple\n",
"x"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 23,
"text": [
"('Manu', [99, 88, 77], 'Jeevan')"
]
}
],
"prompt_number": 23
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"y = ('Mean', 'Median', 'Mode')+('Chisquare', 'Annova') + ('statistical significance',) # you can concatenate a tuple using'+' symbol. \n",
"print y\n",
"('Mean', 'Median') *4 # try printing a tuple using a number"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"('Mean', 'Median', 'Mode', 'Chisquare', 'Annova', 'statistical significance')\n"
]
},
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 29,
"text": [
"('Mean', 'Median', 'Mean', 'Median', 'Mean', 'Median', 'Mean', 'Median')"
]
}
],
"prompt_number": 29
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"deep_learning =('Theano', 'Open cv', 'Torch') # you can un pack a tuple\n",
"x,y,z= deep_learning\n",
"print x\n",
"print y\n",
"print z"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Theano\n",
"Open cv\n",
"Torch\n"
]
}
],
"prompt_number": 35
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"countries ='Usa', 'India', ('Afghanistan',' Pakistan'), \n",
"a,b,(c,d) = countries\n",
"print a\n",
"print b\n",
"print c\n",
"print d"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Usa\n",
"India\n",
"Afghanistan\n",
" Pakistan\n"
]
}
],
"prompt_number": 20
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"countries ='Usa', 'India', ('Afghanistan',' Pakistan'), 'Usa', 'Usa'\n",
"countries.count('Usa') # .count can be used to count how many values are ther in a tuple"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 50,
"text": [
"3"
]
}
],
"prompt_number": 50
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#Lists\n",
"I havent discussed lists, as it is covered in depth in code academy tutorials.\n",
"I am going through the concepts that are not discussed in code academy.\n",
"Some important list concepts are:\n",
" <li>adding and removing elements from a list</li>\n",
" <li>combining and conctenating lists </li>\n",
" <li> sorting </li>\n",
" <li> list slicing</li>\n",
"\n"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"countries =['Usa', 'India','Afghanistan',' Pakistan']\n",
"y = countries.extend(['Britian', 'Canada', 'Uzbekistan', 'Turkey'])\n",
"z = countries.sort(key=len) # countries are sorted according to number of characters\n",
"print countries \n",
"# extend can be a handy feature when your lists are large."
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"['Usa', 'India', 'Canada', 'Turkey', 'Britian', ' Pakistan', 'Uzbekistan', 'Afghanistan']\n"
]
}
],
"prompt_number": 63
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import bisect\n",
"b = [9,9,9,9,5,6,3,5,3,2,1,4,7,8]\n",
"b.sort()\n",
"x =bisect.bisect(b,2) # bisect.bisect finds the location where an element should be inserted to keep it sorted.\n",
"y= bisect.bisect(b, 5)\n",
"print x\n",
"print y"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"2\n",
"7\n"
]
}
],
"prompt_number": 83
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# When iterating over a sequence; to keep track of the index of the current element, you can use 'enumerate'\n",
"languages = ['Bigdata', 'Hadoop', 'mapreduce', 'Nosql']\n",
"\n",
"for i,val in enumerate(languages):\n",
" print i,val"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"0 Bigdata\n",
"1 Hadoop\n",
"2 mapreduce\n",
"3 Nosql\n"
]
}
],
"prompt_number": 97
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#Sorted function returns a new sorted list from a sequence\n",
"print sorted([89, 99,45,63,25,53,34,56])\n",
"print sorted('Big data examiner')"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"[25, 34, 45, 53, 56, 63, 89, 99]\n",
"[' ', ' ', 'B', 'a', 'a', 'a', 'd', 'e', 'e', 'g', 'i', 'i', 'm', 'n', 'r', 't', 'x']\n"
]
}
],
"prompt_number": 101
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"hot_job = ['Big_data', 'data science', 'data scientist', 'data base developer']\n",
"languages = ['c', 'c++', 'java', 'python']\n",
"statistics = ['Mean', 'Median', 'Mode', 'Chi square']\n",
"print zip(hot_job, languages, statistics)\n",
"\n",
"for i, (x,y) in enumerate(zip(hot_job, languages)): #See how I use zip and enumerate together\n",
" print('%d: %s, %s' %(i,x,y))"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"[('Big_data', 'c', 'Mean'), ('data science', 'c++', 'Median'), ('data scientist', 'java', 'Mode'), ('data base developer', 'python', 'Chi square')]\n",
"0: Big_data, c\n",
"1: data science, c++\n",
"2: data scientist, java\n",
"3: data base developer, python\n"
]
}
],
"prompt_number": 106
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# you can unzip a zipped sequence as follows\n",
"rockers = [('Jame', 'Manu'), ('Govind', 'Dheepan'),('Partha', 'Reddy')]\n",
"first_names, last_names = zip(*rockers)\n",
"print first_names\n",
"print last_names"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"('Jame', 'Govind', 'Partha')\n",
"('Manu', 'Dheepan', 'Reddy')\n"
]
}
],
"prompt_number": 113
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#Use reversed keyword to reverse a sequence\n",
"list(reversed(range(20)))"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 114,
"text": [
"[19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0]"
]
}
],
"prompt_number": 114
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#Dictionaries\n",
"Some key concepts to remember in dictionary are:\n",
"<li> How to access elements in a dictionary</li>\n",
"<li> .keys() and .values() methods</li>\n",
"<li> pop and del methods </li>\n",
"\n",
"##Also Go through List and dictionary comphrehensions\n"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# you can combine two dictionaries using 'update' method \n",
"d1 = {'a' : 'octave', 'b' : 'Java'}\n",
"d1.update({'c' : 'foo', 'd' : 12})\n",
"print d1\n",
"d2 = {'a' : 'octave', 'b' : 'Java'}\n",
"d2.update({'b' : 'foo', 'c' : 12}) #the dictionary inside brackets, overrides the value 'b' in d2\n",
"print d2"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"{'a': 'octave', 'c': 'foo', 'b': 'Java', 'd': 12}\n",
"{'a': 'octave', 'c': 12, 'b': 'foo'}\n"
]
}
],
"prompt_number": 135
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# dict type function accepts a tuple\n",
"data_science = dict(zip(range(10), reversed(range(10)))) # see how I am using zip and dict to create a key- value pair\n",
"data_science"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 137,
"text": [
"{0: 9, 1: 8, 2: 7, 3: 6, 4: 5, 5: 4, 6: 3, 7: 2, 8: 1, 9: 0}"
]
}
],
"prompt_number": 137
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# The keys of a dictionary should be immutable(int, string, float, tuples). The technical term for this is hashability\n",
"print hash('string')\n",
"print hash((1,2,3))\n",
"print hash([1,2,4]) # generates an error as lists are immutable"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"-1542666171\n",
"-378539185\n"
]
},
{
"ename": "TypeError",
"evalue": "unhashable type: 'list'",
"output_type": "pyerr",
"traceback": [
"\u001b[1;31m---------------------------------------------------------------------------\u001b[0m\n\u001b[1;31mTypeError\u001b[0m Traceback (most recent call last)",
"\u001b[1;32m<ipython-input-148-27f144be1274>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m()\u001b[0m\n\u001b[0;32m 2\u001b[0m \u001b[1;32mprint\u001b[0m \u001b[0mhash\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;34m'string'\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 3\u001b[0m \u001b[1;32mprint\u001b[0m \u001b[0mhash\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;36m1\u001b[0m\u001b[1;33m,\u001b[0m\u001b[1;36m2\u001b[0m\u001b[1;33m,\u001b[0m\u001b[1;36m3\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m----> 4\u001b[1;33m \u001b[1;32mprint\u001b[0m \u001b[0mhash\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;36m1\u001b[0m\u001b[1;33m,\u001b[0m\u001b[1;36m2\u001b[0m\u001b[1;33m,\u001b[0m\u001b[1;36m4\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[1;31mTypeError\u001b[0m: unhashable type: 'list'"
]
}
],
"prompt_number": 148
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# An easy way to convert a list into a key is to convert it to a tuple\n",
"fg ={}\n",
"fg[tuple([3,4,5])] = 45\n",
"fg"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 152,
"text": [
"{(3, 4, 5): 45}"
]
}
],
"prompt_number": 152
},
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": [
"set "
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# a set is an unordered collection of unique elements.\n",
"set([3,3,4,4,4,6,7,7,7,8])"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 155,
"text": [
"{3, 4, 6, 7, 8}"
]
}
],
"prompt_number": 155
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#Sets support mathematical set operations like union, intersection, difference, and symmetric difference\n",
"a = {1, 2, 3, 4, 5}\n",
"b = {3, 4, 5, 6, 7, 8}\n",
"print a|b # union\n",
"print a&b #intersection-> common elements in two dictionaries\n",
"print a-b\n",
"print a^b # symmetric difference\n",
"print {1,2,3} =={3,2,1} # if values are equal so True"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"set([1, 2, 3, 4, 5, 6, 7, 8])\n",
"set([3, 4, 5])\n",
"set([1, 2])\n",
"set([1, 2, 6, 7, 8])\n",
"True\n"
]
}
],
"prompt_number": 166
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#Default dict"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"football_clubs = ['Manchester', 'Liverpool', 'Arsenal', 'Chelsea', 'Mancity', 'Tottenham', 'Barcelona','Dortmund']\n",
"\n",
"football ={}\n",
"for clubs in football_clubs: \n",
" club = clubs[0] # gets the first character of football_clubs\n",
" if club not in football_clubs: \n",
" football[club] = [clubs]\n",
" else:\n",
" football[club].append(clubs)\n",
"print football "
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"{'A': ['Arsenal'], 'C': ['Chelsea'], 'B': ['Barcelona'], 'D': ['Dortmund'], 'M': ['Mancity'], 'L': ['Liverpool'], 'T': ['Tottenham']}\n"
]
}
],
"prompt_number": 35
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# Usually, a Python dictionary throws a KeyError if you try to get an item with a key that is not currently in the dictionary. \n",
"#The defaultdict in contrast will simply create any items that you try to access (provided of course they do not exist yet). To create such a \"default\" item, it calls the function object that you pass in the constructor \n",
"#(more precisely, it's an arbitrary \"callable\" object, which includes function and type objects).\n",
"\n",
"# The Same operation can be done using default dict\n",
"from collections import defaultdict # default dict is present in collections library\n",
"soccer = defaultdict(list)\n",
"\n",
"for clubs in football_clubs:\n",
" soccer[clubs[0]].append(clubs)\n",
"print soccer"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"defaultdict(<type 'list'>, {'A': ['Arsenal'], 'C': ['Chelsea'], 'B': ['Barcelona'], 'D': ['Dortmund'], 'M': ['Manchester', 'Mancity'], 'L': ['Liverpool'], 'T': ['Tottenham']})\n"
]
}
],
"prompt_number": 37
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#Functions"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# a function can return multiple values\n",
"def b():\n",
" x =34\n",
" y =45\n",
" z =89\n",
" return x,y,z"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 1
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Technically closure functions are called as dynamically-generated function returned by another function. The main property is that the returned function has access to the local variables in local namespace, where it was created. In laymans term a closure function is a function within main function."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# Example of a closure function. The function returns True, if a element is repeated in the list.\n",
"def dict_funct():\n",
" new_dict = {} # create a new dictionary\n",
" def modifier(z): \n",
" if z in new_dict: # if z is in dictionary\n",
" return True \n",
" else: \n",
" new_dict[z]=True\n",
" return False\n",
" return modifier\n",
"\n",
"x = dict_funct()\n",
"list_func = [5,4,6,5,3,4,6,2,1,5]\n",
"y = [x(i) for i in list_func]\n",
"print y "
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"[False, False, False, True, False, True, True, False, False, True]\n"
]
}
],
"prompt_number": 14
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Cleaning data\n",
"\n",
"Raw data is messy. So you have to clean the data set, to make it ready for analysis. Here we have a list of states that consists of unnecessary punctuations,capilitalization and white space. First, I am importing a python module called [regular expression](https://docs.python.org/2/library/re.html). Second, I am creating a funtion called remove_functions, to remove the unnecessary punctuations, re.sub is used to remove unnecessary punctuations in the function. Third, I am creating a list of three functions [str.strip](http://www.tutorialspoint.com/python/string_strip.htm, remove_functions and [str.title](http://www.tutorialspoint.com/python/string_title.htm). \n"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# If we are doing some data cleaning, we will be having a messy data set like this. \n",
"import re\n",
"\n",
"states = [' Kerala', 'Gujarat!', 'Delhi', 'Telengana', 'TriPUra', 'Tamil Nadu##', 'West Bengal?']\n",
"\n",
"\n",
"def remove_functions(strp): \n",
" return re.sub('[!#?]', '', strp) \n",
"\n",
"ooops = [str.strip, remove_functions, str.title] # create a list of functions\n",
"\n",
"def clean_data(oops, funky): # function takes two arguments\n",
" result = [] # create a empty list\n",
" for data in oops: # loop over(go to each and every element) in states\n",
" for fun in funky: # loop over ooops list\n",
" data = fun(data) # apply each and every function in ooops to states.\n",
" result.append(data) # attach formmated states data to a new list\n",
" return result # return the list \n",
" \n",
"x = clean_data(states, ooops)\n",
"print x\n"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"['Kerala', 'Gujarat', 'Delhi', 'Telengana', 'Tripura', 'Tamil Nadu', 'West Bengal']\n"
]
}
],
"prompt_number": 50
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# Lambda is short form of writing a function. \n",
"def f(x):\n",
" return x**2\n",
"print f(8)\n",
"#same function using lambda\n",
"y = lambda x: x**2 \n",
"print y(9)\n"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"64\n",
"81\n"
]
}
],
"prompt_number": 21
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#Generator Expressions"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def new_objjj():\n",
" for x in xrange(100):\n",
" yield x**2 #when using generator functions, Use yield instead of return.\n",
"some_variable = new_objjj()\n",
"\n",
"# The above function can be written as follows\n",
"new_obj = (x**2 for x in range(100)) \n",
"\n",
"#Generator expressions can be used inside any Python function that will accept a generator\n",
"y = sum(x**2 for x in xrange(100))\n",
"print y\n",
"\n",
"dict((i,i**2) for i in xrange(5)) #xrange is faster than range\n"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"328350\n"
]
},
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 9,
"text": [
"{0: 0, 1: 1, 2: 4, 3: 9, 4: 16}"
]
}
],
"prompt_number": 9
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"rkeys=[1,2,3]\n",
"rvals=['South','Sardinia','North']\n",
"rmap={e[0]:e[1] for e in zip(rkeys,rvals)} # use of Zip function\n",
"rmap"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 2,
"text": [
"{1: 'South', 2: 'Sardinia', 3: 'North'}"
]
}
],
"prompt_number": 2
}
],
"metadata": {}
}
]
}
@MelinaAilen
Copy link

MelinaAilen commented Feb 27, 2019

Hi Manu,
Thanks for the summary. One comment only: Because of the names you used for the arguments, I found the Cleaning data section a little bit confusing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment