xxsang/HashTables

## HashTables
1. Faster but doesn’t support order
2. Save in a key-indexed table
	1. Hash function
	2. equality test
	3. Collision resolution
	4. space-time limitation tradeoff -hashing
3. Hash function
	1. Idea: scramble the key uniformly to create a table index
	2. designed differently for practice
	3. if x==y then x.hashCode() == y.hashCode(); highly desirable if x!=y, x.hashCode()!=y.hashCode()
	4. hashing is the memory address of the object
	5. Horner’s method of hashing the string: hash=s[i]+(hash*31), 31x+y rule
	6. In theory, keys are bitstrings, universal hash functions exist
	7. hash function returns a int between 0 and M-1 for array index
	8. Uniformly hashing assumption: each key is equal likely to hash to an integar between 0 and M-1.
4. Seperate chaining
	1. How to deal with collision effiently
	2. Use an M<N linked list attach to each array index
	3. insert: put at front of ith linked list; search: look over all ith linked list for the value
	4. distribution of list size obey binomial distribution, if uniform hashing assumption hold
	5. Thus, number of probes for searching of insertion is propotional to N/M
	6. worst-case: search insert delet lgN, average case 3-5, not ordered
5. Linear Probing
	1. open addressing: once collide, use the next empty array
	2. efficient if the array M is big enough, must bigger than key-value pair N, at least M is half empty, mean displacement ~3/2
	3. if free put it there, else try i+1, i+2, …; if search and hit an empty position without found, return research miss
	4. long clusters may collide if the array becomes too full ~sqrt(pi*M/8)
	5. Keep the array always half full (array resizing) we get constant time insert search delete 3-5
6. Hash table context
	1. Do not spend long time to compute hash function for long array (just look at some characters every 8-9 character skips), can collide at URL
	2. uniform hashing assumption do not always hold in practice
		1. DDOS attack, hash strings to same key
	3. One-way hash function (password), SHA-2, too expensive to be used in symbol table
	4. Seperate chaining vs. linear probing
		1. seperate chaining:easy to implement delete, performance degrade gracefully, clustering less sensitive to poorly designed hash functions
		2. linear probing: less wasted space, better cach performance
		3. two-probe hashing(hash to two positions, put the value at shorter ones, longest become loglogN), double hashing(skip a variable amount in linear probing)
		4. cuckoo hashing: hash key to two positions, insert key in either positions, if occupied, insert to replacement, constant worst case time for searching
	5. Hash table vs BST
		1. unordered, quick, simple
		2. strong performance guarantee, ordered
	1. Faster but doesn’t support order
	2. Save in a key-indexed table
	1. Hash function
	2. equality test
	3. Collision resolution
	4. space-time limitation tradeoff -hashing
	3. Hash function
	1. Idea: scramble the key uniformly to create a table index
	2. designed differently for practice
	3. if x==y then x.hashCode() == y.hashCode(); highly desirable if x!=y, x.hashCode()!=y.hashCode()
	4. hashing is the memory address of the object
	5. Horner’s method of hashing the string: hash=s[i]+(hash*31), 31x+y rule
	6. In theory, keys are bitstrings, universal hash functions exist
	7. hash function returns a int between 0 and M-1 for array index
	8. Uniformly hashing assumption: each key is equal likely to hash to an integar between 0 and M-1.
	4. Seperate chaining
	1. How to deal with collision effiently
	2. Use an M<N linked list attach to each array index
	3. insert: put at front of ith linked list; search: look over all ith linked list for the value
	4. distribution of list size obey binomial distribution, if uniform hashing assumption hold
	5. Thus, number of probes for searching of insertion is propotional to N/M
	6. worst-case: search insert delet lgN, average case 3-5, not ordered
	5. Linear Probing
	1. open addressing: once collide, use the next empty array
	2. efficient if the array M is big enough, must bigger than key-value pair N, at least M is half empty, mean displacement ~3/2
	3. if free put it there, else try i+1, i+2, …; if search and hit an empty position without found, return research miss
	4. long clusters may collide if the array becomes too full ~sqrt(pi*M/8)
	5. Keep the array always half full (array resizing) we get constant time insert search delete 3-5
	6. Hash table context
	1. Do not spend long time to compute hash function for long array (just look at some characters every 8-9 character skips), can collide at URL
	2. uniform hashing assumption do not always hold in practice
	1. DDOS attack, hash strings to same key
	3. One-way hash function (password), SHA-2, too expensive to be used in symbol table
	4. Seperate chaining vs. linear probing
	1. seperate chaining:easy to implement delete, performance degrade gracefully, clustering less sensitive to poorly designed hash functions
	2. linear probing: less wasted space, better cach performance
	3. two-probe hashing(hash to two positions, put the value at shorter ones, longest become loglogN), double hashing(skip a variable amount in linear probing)
	4. cuckoo hashing: hash key to two positions, insert key in either positions, if occupied, insert to replacement, constant worst case time for searching
	5. Hash table vs BST
	1. unordered, quick, simple
	2. strong performance guarantee, ordered