misho-kr/Algorithms: Design and Analysis, Part 1.md

## Algorithms: Design and Analysis, Part 1.md

      
    Raw
  

              Algorithms: Design and Analysis, Part 1.md
            
          
    Algorithms: Design and Analysis, Part 1

This course is at an undergraduate level, likely situated in third or fourth year. Students should feel programming language concepts, including recursion, as well as proof techniques, including induction.
Week 1

I. Introduction


Why study algorithms, designer's mantra -- CAN WE DO BETTER
Karatsuba Multiplication
Merge Sort, analysis generalizes to Master Method
Merge Sort pseudo code -- recursively sort one half, then second half, finally merge the two sorted halves
Merge Sort running time -- 6n * log(n) + 6n

II. Asymptotic Analysis


Provides vocabulary for the design and analysis of algorithms
Coarse enough to suppress details, sharp enough to make useful comparisons between different algorithms
Suppress constant factors and lower-order terms
Big-Oh notation -- T(n) = O(f(n)) if for all sufficiently large n, T(n) is bounded above by a constant multiple of f(n)
Claim: O(nk) is not O(n(k-1))
Omega notation -- T(n) = OM(f(n)) if there exist constant c, n0 such that T(n) >= c * f(n) for every n > n0
Theta notation -- T(n) = O(f(n)) and T(n) = OM(f(n))
Theta notation -- there exist constants c1, c2, n0 such that c1*f(n) <= T(n) >= c2 * f(n) for every n > n0
Little-Oh notation -- T(n) = o(f(n)) if for all constants c>0, there exists a constant n0 such that T(n) <= c * f(n) for every n > n0

III. Divide and Conquer Algorithms


Counting inversions
Motivation -- numerical similarity measure between two ranked lists, e.g. collaborative filtering
Brute force is Theta(n**2), we can do better with Divide + Conquer, piggyback on merge sort
Counting the halves is trivial, but CountSplitInv in linear time is the challenge
Strassen algorithm for matrix multiplication

recursively compute only 7 (cleverly chosen) products
do the necessary (clever) additions + substractions
still Theta(n**2) time, but better than cubic time


Closest Pair Problem

input: a set P = { p1, ... , pn) of n points in the plane R**2
output: a pair (p,q) of distinct points that minimize the Euclidean distance over all (pi,qi) in the set P


Consider the problem in R (linear space) -- O(n**2) but sorting brings that down to O(n)
Make copies of points sorted by x and y coordinates, but that is not enough
ClosestSplitPair(Px, Py, delta), ... , p and q are at most 7 positions apart

Week 2

IV. The Master Method


Simple formulae for calculating Big-O time of algorithms; a black box for solving recurrences
Base case: T(n) <= a*T(n/b) + O(n**d), where a is number of recursive calls, b is input size shrinkage factor, d is exponent in running time of "combine step"
T(n) depends on the ratio of a and b**d
Examples: merge sort, binary search, integer multiplication, Strassen's matrix multiplication algorithm
Proof by induction -- base case and inductive step

V. Quicksort Algorithm


O(n*log(n)) time on average, works in place
Partitioning around a pivot, puts the pivot in its "rightful position, runs in linear time, reduces the problem size
Proof of correctness by induction
Important to choose good pivot, worst case is O(n**2)
Random pivots -- if always get a 25-75 split, good enough for O(n*log(n))

VI. Quicksort Analysis


Can not apply Master Method -- random pivot, unbalanced subproblems
Key Random Variable -- # of comparisons between two input elements made by QuickSort
A Decomposition Principle -- 1) Identify random variable Y that you really care about, 2) Express Y as sum of indicator random variables, 3) Apply Linearity of expectation

VII. Probability Review


Sample Spaces Omega -- all possible outcomes, Sum(p(i)) = 1
Events -- a subset S of Omega
Random Variables -- X is a real-valued function X: Omega -> R
Expectation E[X] of X -- average value of X = Sum(X(i) * p(i))
Linearity of Expectation -- E[ Sum(Xj) ] = Sum( E[Xj] )
Conditional Probability -- Pr[ X|Y ] = Pr[ X and Y ] / Pr[Y]
Independence of Events -- iff Pr[ X and Y ] = Pr[X] * Pr[Y]
Independence of Random Variables -- Pr[ A == a and B == b ] = Pr[ A == a ] * Pr[ B == b ]

Week 3

VIII. LINEAR-TIME SELECTION


Randomized Selection Algorithm, reduction to sorting (qsort) leads to O(n*log(n))
Partitioning around a pivot, such that left of pivot are "less than pivot", right of pivot ...
Randomized selection is correct, running time depends on "quality" of pivot, worst is O(n**2), average is O(n) !
Deterministic Linear-Time Selection -- the key is for ChoosePivot() to return pretty good pivot very quickly
Use "median of medians" with out-of-place sorting, makes 2 recursive calls
Running time is O(n) but the constants are higher so it is not as good as Random Selection
Every "comparison-based" sorting has lower bound of O(n*log(n)) -- merge sort, qsort, heap sort
When making string assumptions about the sorted data -- Bucket Sort, Counting Sort, Radix Sort

IX. GRAPHS AND THE CONTRACTION ALGORITHM


Graphs, vertices, edges, dense and sparse graphs, matrix and adjacency list representations
Minimum Cut Problem is to find a cut with fewest number of crossing edges
Random Contraction Algorithm, what is the probability of success, what could go wrong
Low success probability - 1 / n**2, do repeated trials
Counting Minimum Cuts

Week 4

X. Graph Search and Connectivity


Generic Graph Search -- find everything "findable"
Breadth-First Search (BFS) -- explore nodes in "layers", can compute shortest paths, finds connected components of undirected graphs, run-time O(n+m)
Depth-First Search (DFS) -- explore aggressively, only backtrack when necessary, run-time is O(n+m), computes:
a topological ordering of a directed acyclic graph
strongly connected components of directed graphs
Topological Sort -- to sequence tasks while respecting all precedence constraints, if there is no directed cycle can compute topological ordering in O(n+m) time
Two solutions for Topological Sort -- straightforward and via DFS
Strongly Connected Components -- of a directed graph G are the equivalence classes of the following relation: u <-> v iff there is u -> v and v -> u
Kosaraju algorithm -- two passed of DFS, plus some clever additional book-keeping
Applications of SCC to web -- all 4 parts: one giant SCC, IN, OUT, tubes + tendrils, have roughly the same size, within CORE is very well connected, outside is surprisingly poorly connected

Week 5

XI. Dijkstra's Shortest-path Algorithm


Single-source shortest path, when each edge may have different cost, BFS "assumes" when all edges have equal cost
Works only if path cost is non-negative
Proof of correctness by induction on the number of iterations
Naive implementation has runtime O(nm), but by using heap data structure it goes down to O(mlog(n))

XII. Heaps


Canonical use -- fast way to do repeated minimum, or maximum but not both, computations
Turns Selection Sort O(n**2) into Heap Sort with O(n*log(n))
Supported operations -- Insert O(log(n)), Extract-Min O(log(n)), Heapify O(?), Delete O(log(n))
Another name is Priority Queue
Applications
2-SUM tasks, with alternative solitions 1, 2, 3
Median maintenance
Dijkstra's shortest-path algorithm

XIII. Balanced Binary Search Trees


Better than sorted array, albeit a bit slower, because they support faster insertions and deletions
Search trees can degenerate into linked list, affecting the performance of operations, O(tree-height)
Red-black trees, also AVL-trees, splay-trees, B-trees
Red-black invariants:
Each node is red or black
Root node is black
No two red nodes in a row
Every root-NULL path must have the same number of blacks nodes
Every red-black tree has Height <= 2 * log(n+1)
Key primitives -- left rotation and right rotation
Insertions and deletions, not quite simple operations

Week 6

XIV. Hashing: The Basics


Purpose -- fast O(1) operations
Supported operations
Applications -- de-duplication, determine of the sum of 2 integers is in the hash table, symbol tables in compilers
Naive implementation -- array, list
Birthday paradox
Resolving conflicts -- chaining, open addressing (linear probing or double-hashing)
What is bad hash function; what makes a good hash function
Should "spread data out"
Should be easy to evaluate
How to choose the number of buckets -- prime number, not too close to power of 2, not too close to power of 10

XV. Universal Hashing


The load of the hash table = # of objects / # of buckets, for good performance must control the load factor
Hash function with good performance for every data set does not exist; pathological data set
Solutions
Use cryptographic hash function
Use randomization
Universal Hash Function -- a set of hash functions H: U -> { 0,1, ..., n } such that for any x,y in U: Pr[ x,y collide } <= 1/n, when h is chosen uniformly at random from H
Collision resolution with chaining gives constant-time guarantee -- Carter-Wegman 1979 Theorem
Performance of the collision resolution with open address depends on the load factor

XVI. Bloom Filters


Pros -- fast inserts and lookups, space efficient, no false negatives
Cons -- can't store the object, no deletions, small probability for false positives
Applications -- spellcheckers, list of forbidden passwords, internet routing
Under the hood -- array of n bits, k hash functions
Trade-off between space and error, to minimize the error increase the space