u2takey/数据结构与算法in python_chapter2

## 数据结构与算法in python_chapter2
> Written with [StackEdit](http://benweet.github.io/stackedit/).

#数据结构与算法in python [^footnote1]
##第二章 基础知识
###1.常见的算法复杂度和标记
<a href="http://www.flickr.com/photos/88851493@N05/9449343681/" title="Flickr 上 u2takey 的 _[K}(F7[T)SHZCWVVFA2ATV"><img src="http://farm3.staticflickr.com/2850/9449343681_5a0ce11214.jpg" width="500" height="269" alt="_[K}(F7[T)SHZCWVVFA2ATV"></a>

###2.简单的例子

```
#例子1
nums.append(1)   # Θ (1)
nums.insert(0,2) # Θ ( n )
                 # Θ (1) + Θ ( n ) =  Θ ( n ).时间复杂度之和
```
```
#例子2
s = 0
for x in seq: # Θ ( n )
    s += x

squares = [x**2 for x in seq]# Θ ( n )
```
```
#例子3
s = 0
for x in seq:
    for y in seq:
        s += x*y #Θ ( n · n ) =  Θ ( n^2).时间复杂度之积
```
```
#例子4
s = 0
for x in seq1:
    for y in seq2:
        s += x*y
#Θ(n(n+n2))= Θ(n^2+n^3)= Θ(n^3)
```
###3.三种情况
```
#排序之前检查是否已经有序
#for循环如果没有执行完（即提前break）时会执行else
def sort_w_check(seq):
    n = len(seq)
    for i in range(n-1):
        if seq[i] > seq[i+1]
            break
    else:
        return
    ...
```
- The best case.
- The worst case.
- The average case.
###4.实用算法评估方法
#####1. If possible, don’t worry about it.
#####2. For timing things, use  timeit.
```
>>> import timeit
>>> timeit.timeit('x=sum(list(range(100)))')
6.20188425496622
t=timeit.Timer('x=sum(list(range(100)))')
>>> t.repeat(3,10000)
[0.0776246042671005, 0.06356876681087442, 0.06265058968762105]
>>>
```
#####3. To find bottlenecks, use a profiler.
```
#使用cProfile
import cProfile
cProfile.run('main()')
```
#####4. Plot your results.
Two common plots for looking at performance are **graphs**,for example of problem size vs. running time, and **box plots**	, showing the distribution of running times.
#####5. Be careful when drawing conclusions based on timing comparisons.
#####6. Be careful when drawing conclusions about asymptotics from experiments.

###5.图graph
树是一种特殊的图<br>
图的基础知识
<a href="http://www.flickr.com/photos/88851493@N05/9452616534/" title="Flickr 上 u2takey 的 WI8N{[$)DE$423(W_K1U(6A"><img src="http://farm8.staticflickr.com/7371/9452616534_a9800d13d7.jpg" width="500" height="202" alt="WI8N{[$)DE$423(W_K1U(6A"></a>


一个图的例子
![img](http://farm8.staticflickr.com/7350/9449944087_463acf8971_o.jpg)
```
a, b, c, d, e, f, g, h = range(8)
N = [
    {b, c, d, e, f},    # a 邻近集合set也可使用list此时查询关系的时间复杂度变为Θ(n)
    {c, e},             # b
    {d},                # c
    {e},                # d
    {f},                # e
    {c, g, h},          # f
    {f, h},             # g
    {f, g}              # h
]
>>> b in N[a]  # Neighborhood membership
True
>>> len(N[f])  # Degree
3
```
另一种做法，可以加上weight
```
a, b, c, d, e, f, g, h = range(8)
N = [
    {b:2, c:1, d:3, e:9, f:4},    # a 用一个dict表示，key是neighbor而value是一些附加的值，比如weight
    {c:4, e:3},                   # b
    {d:8},                        # c
    {e:7},                        # d
    {f:5},                        # e
    {c:2, g:2, h:2},              # f
    {f:1, h:6},                   # g
    {f:9, g:8}                    # h
]
>>> b in N[a]  # Neighborhood membership
True
>>> len(N[f])  # Degree
3
>>> N[a][b]    # Edge weight for (a, b)
2
```
使用dict作为main structure

```
N = {
    'a': set('bcdef'),
    'b': set('ce'),
    'c': set('d'),
    'd': set('e'),
    'e': set('f'),
    'f': set('cgh'),
    'g': set('fh'),
    'h': set('fg')
}
```
Adjacency Matrix表示的方法
```
a, b, c, d, e, f, g, h = range(8)

#     a b c d e f g h

N = [[0,1,1,1,1,1,0,0], # a 如果要加上weight值，把1换成weight即可
     [0,0,1,0,1,0,0,0], # b
     [0,0,0,1,0,0,0,0], # c
     [0,0,0,0,1,0,0,0], # d
     [0,0,0,0,0,1,0,0], # e
     [0,0,1,0,0,0,1,1], # f
     [0,0,0,0,0,1,0,1], # g
     [0,0,0,0,0,1,1,0]] # h

>>> N[a][b]    # Neighborhood membership
1
>>> sum(N[f])  # Degree
3
```
```
#更好的方法是把infinite weight用float('inf')表示
_ = float('inf')
 #     a b c d e f g h
 W = [[0,2,1,3,9,4,_,_], # a
     [_,0,4,_,3,_,_,_], # b
     [_,_,0,8,_,_,_,_], # c
     [_,_,_,0,7,_,_,_], # d
     [_,_,_,_,0,5,_,_], # e
     [_,_,2,_,_,0,2,2], # f
     [_,_,_,_,_,1,0,6], # g
     [_,_,_,_,_,9,8,0]] # h

>>> W[a][b] < inf   # Neighborhood membership
True
>>> W[c][e] < inf   # Neighborhood membership
False
>>> sum(1 for w in W[a] if w < inf) - 1  # Degree -1是因为不需要数对角线
5

```
一个矩阵形式可以使用的库，NumPy
```
>>> N = [[0]*10 for i in range(10)]
# in NumPy, you can use the  zeros  function:
>>> import numpy as np
>>> N = np.zeros([10,10])
```

>两种主要的表示方法如何选择的问题：最主要的原则是你要做什么，使你的代码简单清楚。
时间复杂度重要的时候则要比较几种方法的时间复杂度。如：looking up the edge(u,v) in an adjacency matrix is **Θ(1)**, while iterating over  v ’s neighbors is **Θ(n)**; in an adjacency list representation, both operations will be **Θ(d(v))**

###6.树tree
因为树是特殊的图，图的表示形式和算法都可以使用再树上，但是也有一些特别的树的结构可以使得树的算法实现起来更容易清楚。
![img](http://farm6.staticflickr.com/5453/9456376053_35983f82ea_o.jpg)
有根树（rooted tree）的表示方法
```
#一种表示方法，所有的internal node都以其子叶节点列表的形式存在，一个列表即时一个
#internal node的neighbor(child) list
>>> T = [["a", "b"], ["c"], ["d", ["e", "f"]]]
>>> T[0][1]
'b'
>>> T[2][1][0]
'e'
```
当树为二叉树(binary tree)的时候，即每个节点的子节点最多只有两个子节点,可以用一个类表示
```
class Tree:
    def __init__(self, left, right):
        self.left = left
        self.right = right

You can use the Tree class like this:

>>> t = Tree(Tree("a", "b"), Tree("c", "d"))
>>> t.right.left
'c'
```
当然非二叉树也可以用类来表示，Multiway tree
```
class Tree:
    def __init__(self, kids, next=None):
        self.kids = self.val = kids
        self.next = next
>>> t = Tree(Tree("a", Tree("b", Tree("c", Tree("d"))))) #first child,next sibling的形式
>>> t.kids.next.next.val
'c'
```
一个技巧
```
class Bunch(dict):
	def __init__(self,*args,**kwds):
		super(Bunch,self).__init__(*args,**kwds)
		self.dict=self
#使用这个类的好处是可以在构建的时候设置属性，并且可以使用dict的一些方法。
```
>其他资源Graph libraies:<br>
•   NetworkX :  http://networkx.lanl.gov <br>
•   python-graph:  http://code.google.com/p/python-graph  <br>
•   Graphine :  http://gitorious.org/projects/graphine/pages/Home  <br>
There is also Pygr, a graph database (http://bioinfo.mbi.ucla.edu/pygr); Gato, a graph animation
toolbox (http://gato.sourceforge.net ); and PADS, a collection of graph algorithms
( http://www.ics.uci.edu/~eppstein/PADS ).

###7.黑盒子
需要了解黑盒子之下的原理，才能更好的优化代码
```
>>> from random import randrange
>>> L = [randrange(10000) for i in range(1000)]
>>> 42 in L #liner
False
>>> S = set(L)
>>> 42 in S #constant
False
```
```

>>> s = ""
>>> for chunk in input():
...     s += chunk
#上面的做法为quadratic,因为每次都会新建s，应该改写如下
>>> chunks = []
>>> for chunk in input():
...     chunks.append(chunk)
...
>>> s = ''.join(chunks)
```
```
#不要比较两个浮点数，（可以写出‘近似’函数来比较）
>>> sum(0.1 for i in range(10)) == 1.0
False
#使用Decimal模块来做类似的事
>>> from decimal import *
>>> sum(Decimal("0.1") for i in range(10)) == Decimal("1.0")
True
#精度很重要的时候尽量不用减法（相近的两个数相减会损失精度）
>>> from math import sqrt
>>> x = 8762348761.13
>>> sqrt(x + 1) - sqrt(x)
5.341455107554793e-06
>>> 1.0/(sqrt(x + 1) + sqrt(x))
5.3414570026237696e-06 #更准确
```


[^footnote1]: 本文是python algorithm的读书笔记.
	> Written with [StackEdit](http://benweet.github.io/stackedit/).

	#数据结构与算法in python [^footnote1]
	##第二章基础知识
	###1.常见的算法复杂度和标记
	<a href="http://www.flickr.com/photos/88851493@N05/9449343681/" title="Flickr 上 u2takey 的 _[K}(F7[T)SHZCWVVFA2ATV"><img src="http://farm3.staticflickr.com/2850/9449343681_5a0ce11214.jpg" width="500" height="269" alt="_[K}(F7[T)SHZCWVVFA2ATV"></a>

	###2.简单的例子

	```
	#例子1
	nums.append(1) # Θ (1)
	nums.insert(0,2) # Θ ( n )
	# Θ (1) + Θ ( n ) = Θ ( n ).时间复杂度之和
	```
	```
	#例子2
	s = 0
	for x in seq: # Θ ( n )
	s += x

	squares = [x**2 for x in seq]# Θ ( n )
	```
	```
	#例子3
	s = 0
	for x in seq:
	for y in seq:
	s += x*y #Θ ( n · n ) = Θ ( n^2).时间复杂度之积
	```
	```
	#例子4
	s = 0
	for x in seq1:
	for y in seq2:
	s += x*y
	#Θ(n(n+n2))= Θ(n^2+n^3)= Θ(n^3)
	```
	###3.三种情况
	```
	#排序之前检查是否已经有序
	#for循环如果没有执行完（即提前break）时会执行else
	def sort_w_check(seq):
	n = len(seq)
	for i in range(n-1):
	if seq[i] > seq[i+1]
	break
	else:
	return
	...
	```
	- The best case.
	- The worst case.
	- The average case.
	###4.实用算法评估方法
	#####1. If possible, don’t worry about it.
	#####2. For timing things, use timeit.
	```
	>>> import timeit
	>>> timeit.timeit('x=sum(list(range(100)))')
	6.20188425496622
	t=timeit.Timer('x=sum(list(range(100)))')
	>>> t.repeat(3,10000)
	[0.0776246042671005, 0.06356876681087442, 0.06265058968762105]
	>>>
	```
	#####3. To find bottlenecks, use a profiler.
	```
	#使用cProfile
	import cProfile
	cProfile.run('main()')
	```
	#####4. Plot your results.
	Two common plots for looking at performance are graphs,for example of problem size vs. running time, and box plots , showing the distribution of running times.
	#####5. Be careful when drawing conclusions based on timing comparisons.
	#####6. Be careful when drawing conclusions about asymptotics from experiments.

	###5.图graph
	树是一种特殊的图<br>
	图的基础知识
	<a href="http://www.flickr.com/photos/88851493@N05/9452616534/" title="Flickr 上 u2takey 的 WI8N{[$)DE$423(W_K1U(6A"><img src="http://farm8.staticflickr.com/7371/9452616534_a9800d13d7.jpg" width="500" height="202" alt="WI8N{[$)DE$423(W_K1U(6A"></a>


	一个图的例子
	![img](http://farm8.staticflickr.com/7350/9449944087_463acf8971_o.jpg)
	```
	a, b, c, d, e, f, g, h = range(8)
	N = [
	{b, c, d, e, f}, # a 邻近集合set也可使用list此时查询关系的时间复杂度变为Θ(n)
	{c, e}, # b
	{d}, # c
	{e}, # d
	{f}, # e
	{c, g, h}, # f
	{f, h}, # g
	{f, g} # h
	]
	>>> b in N[a] # Neighborhood membership
	True
	>>> len(N[f]) # Degree
	3
	```
	另一种做法，可以加上weight
	```
	a, b, c, d, e, f, g, h = range(8)
	N = [
	{b:2, c:1, d:3, e:9, f:4}, # a 用一个dict表示，key是neighbor而value是一些附加的值，比如weight
	{c:4, e:3}, # b
	{d:8}, # c
	{e:7}, # d
	{f:5}, # e
	{c:2, g:2, h:2}, # f
	{f:1, h:6}, # g
	{f:9, g:8} # h
	]
	>>> b in N[a] # Neighborhood membership
	True
	>>> len(N[f]) # Degree
	3
	>>> N[a][b] # Edge weight for (a, b)
	2
	```
	使用dict作为main structure

	```
	N = {
	'a': set('bcdef'),
	'b': set('ce'),
	'c': set('d'),
	'd': set('e'),
	'e': set('f'),
	'f': set('cgh'),
	'g': set('fh'),
	'h': set('fg')
	}
	```
	Adjacency Matrix表示的方法
	```
	a, b, c, d, e, f, g, h = range(8)

	# a b c d e f g h

	N = [[0,1,1,1,1,1,0,0], # a 如果要加上weight值，把1换成weight即可
	[0,0,1,0,1,0,0,0], # b
	[0,0,0,1,0,0,0,0], # c
	[0,0,0,0,1,0,0,0], # d
	[0,0,0,0,0,1,0,0], # e
	[0,0,1,0,0,0,1,1], # f
	[0,0,0,0,0,1,0,1], # g
	[0,0,0,0,0,1,1,0]] # h

	>>> N[a][b] # Neighborhood membership
	1
	>>> sum(N[f]) # Degree
	3
	```
	```
	#更好的方法是把infinite weight用float('inf')表示
	_ = float('inf')
	# a b c d e f g h
	W = [[0,2,1,3,9,4,_,_], # a
	[_,0,4,_,3,_,_,_], # b
	[_,_,0,8,_,_,_,_], # c
	[_,_,_,0,7,_,_,_], # d
	[_,_,_,_,0,5,_,_], # e
	[_,_,2,_,_,0,2,2], # f
	[_,_,_,_,_,1,0,6], # g
	[_,_,_,_,_,9,8,0]] # h

	>>> W[a][b] < inf # Neighborhood membership
	True
	>>> W[c][e] < inf # Neighborhood membership
	False
	>>> sum(1 for w in W[a] if w < inf) - 1 # Degree -1是因为不需要数对角线
	5

	```
	一个矩阵形式可以使用的库，NumPy
	```
	>>> N = [[0]*10 for i in range(10)]
	# in NumPy, you can use the zeros function:
	>>> import numpy as np
	>>> N = np.zeros([10,10])
	```

	>两种主要的表示方法如何选择的问题：最主要的原则是你要做什么，使你的代码简单清楚。
	时间复杂度重要的时候则要比较几种方法的时间复杂度。如：looking up the edge(u,v) in an adjacency matrix is Θ(1), while iterating over v ’s neighbors is Θ(n); in an adjacency list representation, both operations will be Θ(d(v))

	###6.树tree
	因为树是特殊的图，图的表示形式和算法都可以使用再树上，但是也有一些特别的树的结构可以使得树的算法实现起来更容易清楚。
	![img](http://farm6.staticflickr.com/5453/9456376053_35983f82ea_o.jpg)
	有根树（rooted tree）的表示方法
	```
	#一种表示方法，所有的internal node都以其子叶节点列表的形式存在，一个列表即时一个
	#internal node的neighbor(child) list
	>>> T = [["a", "b"], ["c"], ["d", ["e", "f"]]]
	>>> T[0][1]
	'b'
	>>> T[2][1][0]
	'e'
	```
	当树为二叉树(binary tree)的时候，即每个节点的子节点最多只有两个子节点,可以用一个类表示
	```
	class Tree:
	def __init__(self, left, right):
	self.left = left
	self.right = right

	You can use the Tree class like this:

	>>> t = Tree(Tree("a", "b"), Tree("c", "d"))
	>>> t.right.left
	'c'
	```
	当然非二叉树也可以用类来表示，Multiway tree
	```
	class Tree:
	def __init__(self, kids, next=None):
	self.kids = self.val = kids
	self.next = next
	>>> t = Tree(Tree("a", Tree("b", Tree("c", Tree("d"))))) #first child,next sibling的形式
	>>> t.kids.next.next.val
	'c'
	```
	一个技巧
	```
	class Bunch(dict):
	def __init__(self,args,*kwds):
	super(Bunch,self).__init__(args,*kwds)
	self.dict=self
	#使用这个类的好处是可以在构建的时候设置属性，并且可以使用dict的一些方法。
	```
	>其他资源Graph libraies:<br>
	• NetworkX : http://networkx.lanl.gov <br>
	• python-graph: http://code.google.com/p/python-graph <br>
	• Graphine : http://gitorious.org/projects/graphine/pages/Home <br>
	There is also Pygr, a graph database (http://bioinfo.mbi.ucla.edu/pygr); Gato, a graph animation
	toolbox (http://gato.sourceforge.net ); and PADS, a collection of graph algorithms
	( http://www.ics.uci.edu/~eppstein/PADS ).

	###7.黑盒子
	需要了解黑盒子之下的原理，才能更好的优化代码
	```
	>>> from random import randrange
	>>> L = [randrange(10000) for i in range(1000)]
	>>> 42 in L #liner
	False
	>>> S = set(L)
	>>> 42 in S #constant
	False
	```
	```

	>>> s = ""
	>>> for chunk in input():
	... s += chunk
	#上面的做法为quadratic,因为每次都会新建s，应该改写如下
	>>> chunks = []
	>>> for chunk in input():
	... chunks.append(chunk)
	...
	>>> s = ''.join(chunks)
	```
	```
	#不要比较两个浮点数，（可以写出‘近似’函数来比较）
	>>> sum(0.1 for i in range(10)) == 1.0
	False
	#使用Decimal模块来做类似的事
	>>> from decimal import *
	>>> sum(Decimal("0.1") for i in range(10)) == Decimal("1.0")
	True
	#精度很重要的时候尽量不用减法（相近的两个数相减会损失精度）
	>>> from math import sqrt
	>>> x = 8762348761.13
	>>> sqrt(x + 1) - sqrt(x)
	5.341455107554793e-06
	>>> 1.0/(sqrt(x + 1) + sqrt(x))
	5.3414570026237696e-06 #更准确
	```









	[^footnote1]: 本文是python algorithm的读书笔记.