Skip to content

Instantly share code, notes, and snippets.

@anandology
Last active January 4, 2016 18:59
Show Gist options
  • Save anandology/8663909 to your computer and use it in GitHub Desktop.
Save anandology/8663909 to your computer and use it in GitHub Desktop.
Notes from python training at LinkedIn (Jan 28-Jan30, 2014) - http://nbviewer.ipython.org/gist/anandology/8663909/
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
{
"metadata": {
"name": ""
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Python Training at LinkedIn - Day 2\n",
"\n",
"Jan 28-30, 2014 - [Anand Chitipothu](http://anandology.com)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Writing Custom Modules"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%%file square.py\n",
"def square(x):\n",
" return x*x"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Writing square.py\n"
]
}
],
"prompt_number": 1
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import square\n",
"print square.square(4)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"16\n"
]
}
],
"prompt_number": 2
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"When a module is imported all the code in that file is executed just like a script and the variables, functions, classes defined in there will be available via the module."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"What happens if there is some test code?"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%%file square2.py\n",
"def square(x):\n",
" return x*x\n",
"\n",
"print __name__\n",
"\n",
"if __name__ == \"__main__\":\n",
" # run this part of code only when this file\n",
" # is executed as script.\n",
" # ignore this when imported as a module.\n",
" print square(4)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Overwriting square2.py\n"
]
}
],
"prompt_number": 9
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import square2\n",
"\n",
"# reimport the module if it is already imported\n",
"reload(square2) \n",
"\n",
"print square2.square(100)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"square2\n",
"10000\n"
]
}
],
"prompt_number": 10
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"!python square2.py"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"__main__\r\n",
"16\r\n"
]
}
],
"prompt_number": 11
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Working with files"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%%file numbers.txt\n",
"1 one\n",
"2 two\n",
"3 there\n",
"4 four\n",
"5 five\n"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Overwriting numbers.txt\n"
]
}
],
"prompt_number": 35
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"print open(\"numbers.txt\").read()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"1 one\n",
"2 two\n",
"3 there\n",
"4 four\n",
"5 five\n"
]
}
],
"prompt_number": 36
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"f = open(\"numbers.txt\", \"r\")\n",
"f.readline()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 26,
"text": [
"'1 one\\n'"
]
}
],
"prompt_number": 26
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"f.readline()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 27,
"text": [
"'2 two\\n'"
]
}
],
"prompt_number": 27
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"f.readline()\n"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 28,
"text": [
"'3 there\\n'"
]
}
],
"prompt_number": 28
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"f.readline()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 29,
"text": [
"'4 four\\n'"
]
}
],
"prompt_number": 29
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"f.readline()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 30,
"text": [
"'5 five'"
]
}
],
"prompt_number": 30
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"f.readline()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 31,
"text": [
"''"
]
}
],
"prompt_number": 31
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It returns an empty string when it reaches the end of file."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"open(\"numbers.txt\").readlines()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 32,
"text": [
"['1 one\\n', '2 two\\n', '3 there\\n', '4 four\\n', '5 five']"
]
}
],
"prompt_number": 32
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"open(\"numbers.txt\").read()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 37,
"text": [
"'1 one\\n2 two\\n3 there\\n4 four\\n5 five'"
]
}
],
"prompt_number": 37
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Example: wordcount**"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"!wc numbers.txt"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
" 5 10 34 numbers.txt\r\n"
]
}
],
"prompt_number": 38
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%%file wc.py\n",
"import sys\n",
"\n",
"def linecount(filename):\n",
" return len(open(filename).readlines())\n",
"\n",
"def wordcount(filename):\n",
" return len(open(filename).read().split())\n",
"\n",
"def charcount(filename):\n",
" return len(open(filename).read())\n",
"\n",
"def main():\n",
" filename = sys.argv[1]\n",
" print linecount(filename), wordcount(filename), charcount(filename), filename\n",
" \n",
"if __name__ == \"__main__\":\n",
" main()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Overwriting wc.py\n"
]
}
],
"prompt_number": 47
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"!python wc.py numbers.txt"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"5 10 34 numbers.txt\r\n"
]
}
],
"prompt_number": 48
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import wc\n",
"print wc.wordcount(\"wc.py\")"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"29\n"
]
}
],
"prompt_number": 49
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Problem:** Write a program `head.py` to print the first 10 lines of given file.\n",
"\n",
" $ python head.py input-file.txt\n",
" ...\n",
" ..."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Problem** Write a program `tail.py` to print the last 10 lines of given file.\n",
"\n",
" $ python tail.py input-file.txt\n",
" ...\n",
" ..."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Problem** Write a program `grep.py` that takes a pattern and filename as command-line arguments and prints all the lines in that file which contain the given pattern.\n",
"<pre>\n",
"$ python grep.py def wc.py\n",
"def linecount(filename):\n",
"def wordcount(filename):\n",
"def charcount(filename):\n",
"def main():\n",
"</pre>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Problem:** Write a program `wrap.py` that takes a filename and width and wraps the lines that are longer than the width.\n",
"\n",
"<pre>\n",
"$ python wrap.py numbers.txt 4\n",
"1 on\n",
"e\n",
"2 tw\n",
"o\n",
"3 th\n",
"ree\n",
"</pre>\n",
" "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## List Comprehensions"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def squares(numbers):\n",
" result = []\n",
" for n in numbers:\n",
" result.append(n*n)\n",
" return result\n",
"\n",
"print squares(range(10))"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]\n"
]
}
],
"prompt_number": 50
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"[n*n for n in range(10)]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 51,
"text": [
"[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]"
]
}
],
"prompt_number": 51
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"[n*n for n in range(10) if n%2 == 0]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 52,
"text": [
"[0, 4, 16, 36, 64]"
]
}
],
"prompt_number": 52
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Example: parsing CSV files**"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%%file a.csv\n",
"a,b,c\n",
"1,1,1\n",
"2,4,8\n",
"3,9,27\n",
"4,16,64"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Writing a.csv\n"
]
}
],
"prompt_number": 53
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"[line.strip(\"\\n\").split(\",\") for line in open('a.csv').readlines()]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 57,
"text": [
"[['a', 'b', 'c'],\n",
" ['1', '1', '1'],\n",
" ['2', '4', '8'],\n",
" ['3', '9', '27'],\n",
" ['4', '16', '64']]"
]
}
],
"prompt_number": 57
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def parse_csv(filename):\n",
" return [line.strip(\"\\n\").split(\",\") \n",
" for line in open(filename).readlines()] "
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 59
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Problem** Improve the above `parse_csv` function to ignore comment lines. Assume that comment lines start with \"#\" character."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Problem:** Improve the above `parse_csv` function to take delimiter as argument.\n",
"\n",
" >>> parse_csv(\"/etc/passwd\", \":\")\n",
" [...]\n",
" \n",
"Try printing all usernames from /etc/password."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Problem** Reimplement the `sum.py` program we write earlier to compute sum of all command-line arguments using list comprehensions.\n",
"\n",
"<pre>\n",
"$ python sum.py 1 2 3 4 5\n",
"15\n",
"</pre>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Example: Pythogorean triplets**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`(x, y, z)` is a pythogorean triplet if `x*x + y*y == z*z`."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# lets find all pythogorean triplets below n\n",
"n = 25\n",
"[(x,y,z) for x in range(1, n) \n",
" for y in range(x, n) \n",
" for z in range(y, n) \n",
" if x*x+y*y==z*z]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 66,
"text": [
"[(3, 4, 5), (5, 12, 13), (6, 8, 10), (8, 15, 17), (9, 12, 15), (12, 16, 20)]"
]
}
],
"prompt_number": 66
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Problem:** Write a function `triplets` that takes a number `n` as argument and returns a list of triplets such that sum of them is equal to n. Please note that (a,b,c), (b, a, c) represent the same triplet."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Advanced Iteration Techniques"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"x = [1, 2, 3, 4]\n",
"y = [4, 5, 6, 9]\n",
"\n",
"for a, b in zip(x, y):\n",
" print a+b"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"5\n",
"7\n",
"9\n",
"13\n"
]
}
],
"prompt_number": 68
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"[a+b for a, b in zip(x, y)]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 69,
"text": [
"[5, 7, 9, 13]"
]
}
],
"prompt_number": 69
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"names = [\"a\", \"b\", \"c\", \"d\"]"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 71
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"for i, name in enumerate(names):\n",
" print i, name"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"0 a\n",
"1 b\n",
"2 c\n",
"3 d\n"
]
}
],
"prompt_number": 72
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Problem** Implement function `myzip` that takes 2 lists as arguments and returns a list of tuples containing one element from each list.\n",
"\n",
" >>> myzip(['a', 'b', 'c'], [1, 2, 3])\n",
" [('a', 1), ('b', 2), ('c', 3)]"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def myzip(x, y):\n",
" n = min(len(x), len(y))\n",
" return [(x[i], y[i]) for i in range(n)]\n",
"\n",
"print myzip([1, 2, 3, 4], ['a', 'b', 'c'])"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"[(1, 'a'), (2, 'b'), (3, 'c')]\n"
]
}
],
"prompt_number": 75
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def myenumerate(x):\n",
" return myzip(range(len(x)), x)"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 76
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"myenumerate(['a', 'b', 'c'])"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 77,
"text": [
"[(0, 'a'), (1, 'b'), (2, 'c')]"
]
}
],
"prompt_number": 77
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Dictionaries"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"a = {\"x\": 1, \"y\": 2, \"z\": 3}"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 78
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"a"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 79,
"text": [
"{'x': 1, 'y': 2, 'z': 3}"
]
}
],
"prompt_number": 79
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"a['x']"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 80,
"text": [
"1"
]
}
],
"prompt_number": 80
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"'x' in a"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 81,
"text": [
"True"
]
}
],
"prompt_number": 81
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"a['x'] = 11\n",
"a"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 82,
"text": [
"{'x': 11, 'y': 2, 'z': 3}"
]
}
],
"prompt_number": 82
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"len(a)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 83,
"text": [
"3"
]
}
],
"prompt_number": 83
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"del a['x']"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 84
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"a"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 85,
"text": [
"{'y': 2, 'z': 3}"
]
}
],
"prompt_number": 85
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"a.keys()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 86,
"text": [
"['y', 'z']"
]
}
],
"prompt_number": 86
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"a.values()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 87,
"text": [
"[2, 3]"
]
}
],
"prompt_number": 87
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"a.items()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 88,
"text": [
"[('y', 2), ('z', 3)]"
]
}
],
"prompt_number": 88
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"a = {'x': 1, 'y': 2, 'z': 3}"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 89
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"for k in a:\n",
" print k"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"y\n",
"x\n",
"z\n"
]
}
],
"prompt_number": 91
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"for k, v in a.items():\n",
" print k, v"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"y 2\n",
"x 1\n",
"z 3\n"
]
}
],
"prompt_number": 92
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Other useful methods on dictionary are `get` and `setdefault`."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"a"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 94,
"text": [
"{'x': 1, 'y': 2, 'z': 3}"
]
}
],
"prompt_number": 94
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# returns a['x'] becauses 'x' is present in a \n",
"a.get('x', 5) "
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 97,
"text": [
"1"
]
}
],
"prompt_number": 97
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"a.get('p', 8)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 98,
"text": [
"8"
]
}
],
"prompt_number": 98
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`setdefault` works like `get`, but it also adds the value to the dict when it is not present."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"print a.setdefault('x', 5)\n",
"print a"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"1\n",
"{'y': 2, 'x': 1, 'z': 3}\n"
]
}
],
"prompt_number": 100
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"print a.setdefault('p', 8)\n",
"print a"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"8\n",
"{'y': 2, 'x': 1, 'z': 3, 'p': 8}\n"
]
}
],
"prompt_number": 101
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Example: Word Frequency**\n",
"\n",
"Lets write a function to compute frequency of words in a given list of words."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%%file wordfreq.py\n",
"import sys\n",
"\n",
"def wordfreq(words):\n",
" freq = {}\n",
" for w in words:\n",
" freq[w] = freq.get(w, 0) + 1\n",
" return freq\n",
"\n",
"def readwords(filename):\n",
" return open(filename).read().split()\n",
"\n",
"def print_freq(freq):\n",
" for w, count in freq.items():\n",
" print w, count\n",
"\n",
"def main():\n",
" filename = sys.argv[1]\n",
" words = readwords(filename)\n",
" freq = wordfreq(words)\n",
" print_freq(freq)\n",
"\n",
"if __name__ == \"__main__\":\n",
" main()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Overwriting wordfreq.py\n"
]
}
],
"prompt_number": 107
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"!python wordfreq.py wc.py"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"charcount(filename): 1\r\n",
"wordcount(filename), 1\r\n",
"main() 1\r\n",
"wordcount(filename): 1\r\n",
"charcount(filename), 1\r\n",
"filename 2\r\n",
"print 1\r\n",
"import 1\r\n",
"if 1\r\n",
"= 1\r\n",
"sys.argv[1] 1\r\n",
"return 3\r\n",
"== 1\r\n",
"sys 1\r\n",
"len(open(filename).readlines()) 1\r\n",
"__name__ 1\r\n",
"\"__main__\": 1\r\n",
"linecount(filename): 1\r\n",
"len(open(filename).read().split()) 1\r\n",
"linecount(filename), 1\r\n",
"len(open(filename).read()) 1\r\n",
"def 4\r\n",
"main(): 1\r\n"
]
}
],
"prompt_number": 109
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Problem:** Write a program `top-words.py` that takes a filename as argument and prints the top 10 words with their number of occurences.\n",
"\n",
"<pre>\n",
"$ python top-words.py a.txt\n",
"...\n",
"...\n",
"</pre>"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%%file top-word.py\n",
"import sys\n",
"import wordfreq\n",
"\n",
"def main():\n",
" filename = sys.argv[1]\n",
" words = wordfreq.readwords(filename)\n",
" freq = wordfreq.wordfreq(words)\n",
" \n",
" def getvalue(item):\n",
" value = item[1]\n",
" return value\n",
" \n",
" pairs = sorted(freq.items(), key=getvalue, reverse=True)\n",
" for w, count in pairs[:10]:\n",
" print w, count\n",
" \n",
"if __name__ == \"__main__\":\n",
" main()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Overwriting top-word.py\n"
]
}
],
"prompt_number": 122
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"!python top-word.py tut.txt"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"and 14\r\n",
"Python 12\r\n",
"the 8\r\n",
"to 5\r\n",
"in 5\r\n",
"The 5\r\n",
"of 4\r\n",
"a 4\r\n",
"for 4\r\n",
"interpreter 3\r\n"
]
}
],
"prompt_number": 123
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Python Standard Library"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**`os` module**"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import os"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 125
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"os.listdir(\".\")"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 126,
"text": [
"['.git',\n",
" '.ipynb_checkpoints',\n",
" 'a.csv',\n",
" 'day1.ipynb',\n",
" 'day2.ipynb',\n",
" 'numbers.txt',\n",
" 'square.py',\n",
" 'square.pyc',\n",
" 'square2.py',\n",
" 'square2.pyc',\n",
" 'top-word.py',\n",
" 'tut.txt',\n",
" 'wc.py',\n",
" 'wc.pyc',\n",
" 'wordfreq.py',\n",
" 'wordfreq.pyc']"
]
}
],
"prompt_number": 126
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"os.getcwd()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 127,
"text": [
"'/users/anand/work/trainings/2014/linkedin/notebook'"
]
}
],
"prompt_number": 127
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Problem** Write a program `ls.py` that takes a directory path as command line argument and prints all files in the that directory."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Problem:** Write a progra `extcount.py` that counts number of files for each extension in the given directory. The program should take dirrectory name as command line argument and print count and extention for each available extension.\n",
"\n",
"<pre>\n",
"$ python extcount.py foo\n",
"14 py\n",
"4 txt\n",
"1 csv\n",
"</pre>\n",
"\n",
"Hint: can you reuse the `wordfreq` module that we wrote earier?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**`urllib` module**"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import urllib\n",
"\n",
"response = urllib.urlopen(\"http://python.org/\")"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 128
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"response"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 129,
"text": [
"<addinfourl at 4341837768 whose fp = <socket._fileobject object at 0x102cae750>>"
]
}
],
"prompt_number": 129
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"print response.headers"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Date: Wed, 29 Jan 2014 10:37:58 GMT\r\n",
"Server: Apache/2.2.22 (Debian)\r\n",
"Last-Modified: Wed, 29 Jan 2014 02:54:18 GMT\r\n",
"ETag: \"105800d-4e7b-4f1130e928a80\"\r\n",
"Accept-Ranges: bytes\r\n",
"Content-Length: 20091\r\n",
"Vary: Accept-Encoding\r\n",
"Connection: close\r\n",
"Content-Type: text/html\r\n",
"\n"
]
}
],
"prompt_number": 130
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"html = response.read()"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 131
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"len(html)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 132,
"text": [
"20091"
]
}
],
"prompt_number": 132
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"print html[:200]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\n",
"\n",
"\n",
"<html xmlns=\"http://www.w3.org/1999/xhtml\" xml:lang=\"en\" lang=\"en\">\n",
"\n",
"<head>\n",
"\n"
]
}
],
"prompt_number": 133
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"html = urllib.urlopen(\"http://python.org/\").read()"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 134
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"html[:200]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 135,
"text": [
"'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\n\\n\\n<html xmlns=\"http://www.w3.org/1999/xhtml\" xml:lang=\"en\" lang=\"en\">\\n\\n<head>\\n'"
]
}
],
"prompt_number": 135
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Problem** Write a program `wget.py` to download given URL. The should accept URL as a command-line argument and save it with the basename of the URL. If the URL ends with `/`, it should save it as `index.html`.\n",
"\n",
"<pre>\n",
"$ python wget.py http://docs.python.org/3/tutorial/modules.html\n",
"saving http://docs.python.org/3/tutorial/modules.html as modules.html\n",
"\n",
"$ python wget.py http://python.org/\n",
"saving http://python.org/ as index.html\n",
"</pre>\n"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# open a file in write mode\n",
"f = open(\"b.txt\", \"w\")\n",
"# write contents to it\n",
"f.write(\"hello world!\")\n",
"# and close the file\n",
"f.close()"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 136
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"!cat b.txt"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"hello world!"
]
}
],
"prompt_number": 137
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Problem** Find the most frequently used words in *Hamlet by William Shakespeare*. The fulltext of the work is available at Project Gutenburg. \n",
"\n",
"http://www.gutenberg.org/cache/epub/1787/pg1787.txt\n",
"\n",
"(cached copy available at http://anandology.com/tmp/hamlet.txt)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Problem** Change above program to print most frequently used words that are not present in the english dictionary. \n",
"\n",
"All the words in the english dictionary are available in `/usr/share/dict/words` file on most unix machines."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%%file hamlet.py\n",
"import urllib\n",
"import wordfreq\n",
"\n",
"def get_top(d, n):\n",
" def getvalue(kv):\n",
" return kv[1]\n",
" return sorted(d.items(), key=getvalue, reverse=True)[:n] \n",
"\n",
"def process_word(word):\n",
" return word.lower().replace(\".\", \"\").replace(\",\", \"\")\n",
"\n",
"def main():\n",
" url = \"http://anandology.com/tmp/hamlet.txt\"\n",
" words = urllib.urlopen(url).read().split()\n",
" \n",
" dictwords = set(open(\"/usr/share/dict/words\").read().split())\n",
" \n",
" words = [process_word(w) for w in words]\n",
" words = [w for w in words if not w in dictwords]\n",
" \n",
" freq = wordfreq.wordfreq(words)\n",
" top = get_top(freq, 50)\n",
" for w, count in top:\n",
" print count, w\n",
"\n",
"if __name__ == \"__main__\":\n",
" main()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Overwriting hamlet.py\n"
]
}
],
"prompt_number": 150
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"!python hamlet.py"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"173 haue\r\n",
"116 all\r\n",
"95 hor\r\n",
"64 'tis\r\n",
"62 qu\r\n",
"61 selfe\r\n",
"60 laer\r\n",
"59 giue\r\n",
"58 loue\r\n",
"58 ile\r\n",
"57 vs\r\n",
"56 ophe\r\n",
"50 speake\r\n",
"50 vpon\r\n",
"41 heere\r\n",
"40 thinke\r\n",
"38 polon\r\n",
"35 heauen\r\n",
"34 queene\r\n",
"34 horatio\r\n",
"33 th'\r\n",
"33 lord?\r\n",
"33 vp\r\n",
"31 etext\r\n",
"30 looke\r\n",
"30 owne\r\n",
"29 clo\r\n",
"29 laertes\r\n",
"29 soule\r\n",
"28 heare\r\n",
"27 doth\r\n",
"27 hast\r\n",
"27 againe\r\n",
"26 &\r\n",
"24 neuer\r\n",
"24 leaue\r\n",
"23 downe\r\n",
"22 ophelia\r\n",
"21 \r\n",
"21 euen\r\n",
"20 deere\r\n",
"20 poore\r\n",
"20 fathers\r\n",
"19 players\r\n",
"19 i'th'\r\n",
"19 england\r\n",
"19 polonius\r\n",
"19 words\r\n",
"19 shew\r\n",
"18 osr\r\n"
]
}
],
"prompt_number": 149
},
{
"cell_type": "code",
"collapsed": false,
"input": [],
"language": "python",
"metadata": {},
"outputs": []
}
],
"metadata": {}
}
]
}
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment