klokik/computability.md

## computability.md

      
    Raw
  

              computability.md
            
          
    An algorithm is a procedure that you can write as a C function or program, or any other language or more formally construct some variant of turing machine. An algorithm states explicitly how the data will be manipulated.
#Algorithm Efficiency
Some algorithms are more efficient than others. We would prefer to chose an efficient algorithm, so it would be nice to have metrics for comparing algorithm efficiency.
The complexity of an algorithm is a function describing the efficiency of the algorithm in terms of the amount of data the algorithm must process. Usually there are natural units for the domain and range of this function. There are two main complexity measures of the efficiency of an algorithm:

Time complexity is a function describing the amount of time an algorithm takes in terms of the amount of input to the algorithm. "Time" can mean the number of memory accesses performed, the number of comparisons between integers, the number of times some inner loop is executed, or some other natural unit related to the amount of real time the algorithm will take. We try to keep this idea of time separate from "wall clock" time, since many factors unrelated to the algorithm itself can affect the real time. It turns out that, if we chose the units wisely, all of the other stuff doesn't matter and we can get an independent measure of the efficiency of the algorithm.
Space complexity is a function describing the amount of memory (space) an algorithm takes in terms of the amount of input to the algorithm. We often speak of "extra" memory needed, not counting the memory needed to store the input itself. Again, we use natural (but fixed-length) units to measure this. We can use bytes, but it's easier to use, say, number of integers used, number of fixed-sized structures, etc. Space complexity is sometimes ignored because the space used is minimal and/or obvious, but sometimes it becomes as important an issue as time.
For example, we might say "this algorithm takes n^2 time," where n is the number of items in the input. Or we might say "this algorithm takes constant extra space," because the amount of extra memory needed doesn't vary with the number of items processed.
For both time and space, we are interested in the asymptotic complexity of the algorithm: When n (the number of items of input) goes to infinity, what happens to the performance of the algorithm?

##Asymptotic Notation
###Big O
The most common notation used is "big O" notation. If say some algorithms takes n^2 + 3n - 4 steps, then we would say n^2 + 3n - 4 = O(n^2) (read "big oh of n squared"). This means, intuitively, that the important part of n^2 + 3n - 4 is the n^2 part.

Definition: Let f(n) and g(n) be functions, where n is a positive integer. We write f(n) = O(g(n)) if and only if there exists a real number c and positive integer n0 satisfying 0 <= f(n) <= cg(n) for all n >= n_0. (And we say, "f of n is big oh of g of n." We might also say or write f(n) is in O(g(n)), because we can think of O as a set of functions all with the same property. But we won't often do that in Data Structures.)
This means that, for example, that functions like n^2 + n, 4n^2 - n log n + 12, n^2/5 - 100n, n log n, 50n, and so forth are all O(n^2). Every function f(n) bounded above by some constant multiple g(n) for all values of n greater than a certain value is O(g(n)).

Now let's review some most often used complexity classes:
An algorithm is said to be constant time (also written as O(1) time) if the value of T(n) is bounded by a value that does not depend on the size of the input. For example, accessing any single element in an array takes constant time as only one operation has to be performed to locate it. However, finding the minimal value in an unordered array is not a constant time operation as a scan over each element in the array is needed in order to determine the minimal value. Hence it is a linear time operation, taking O(n) time. If the number of elements is known in advance and does not change, however, such an algorithm can still be said to run in constant time.
An algorithm is said to take logarithmic time if T(n) = O(log n). Due to the use of the binary numeral system by computers, the logarithm is frequently base 2 (that is, log2 n, sometimes written lg n). However, by the change of base for logarithms, loga n and logb n differ only by a constant multiplier, which in big-O notation is discarded; thus O(log n) is the standard notation for logarithmic time algorithms regardless of the base of the logarithm.
Algorithms taking logarithmic time are commonly found in operations on binary trees or when using binary search.
An O(log n) algorithm is considered highly efficient, as the operations per instance required to complete decrease with each instance.
An algorithm is said to take linear time, or O(n) time, if its time complexity is O(n). Informally, this means that for large enough input sizes the running time increases linearly with the size of the input.
An algorithm is said to be of polynomial time if its running time is upper bounded by a polynomial expression in the size of the input for the algorithm, i.e., T(n) = O(n^k) for some constant k. Problems for which a deterministic polynomial time algorithm exists belong to the complexity class P, which is central in the field of computational complexity theory.
An algorithm is said to be exponential time, if T(n) is upper bounded by 2^poly(n), where poly(n) is some polynomial in n. More formally, an algorithm is exponential time if T(n) is bounded by O(2^n^k) for some constant k. Problems which admit exponential time algorithms on a deterministic Turing machine form the complexity class known as EXP.
Sometimes, exponential time is used to refer to algorithms that have T(n) = 2^O(n), where the exponent is at most a linear function of n. This gives rise to the complexity class E.
Problems that can be solved in theory (e.g., given large but finite time), but which in practice take too long for their solutions to be useful, are known as intractable problems. In complexity theory, problems that lack polynomial-time solutions are considered to be intractable for more than the smallest inputs. In fact, the Cobham–Edmonds thesis states that only those problems that can be solved in polynomial time can be feasibly computed on some computational device. Problems that are known to be intractable in this sense include those that are EXPTIME-hard. If NP is not the same as P, then the NP-complete problems are also intractable in this sense. To see why exponential-time algorithms might be unusable in practice, consider a program that makes 2^n operations before halting. For small n, say 100, and assuming for the sake of example that the computer does 1012 operations each second, the program would run for about 4 × 1010 years, which is the same order of magnitude as the age of the universe. Even with a much faster computer, the program would only be useful for very small instances and in that sense the intractability of a problem is somewhat independent of technological progress. Nevertheless, a polynomial time algorithm is not always practical. If its running time is, say, n^15, it is unreasonable to consider it efficient and it is still useless except on small instances.
What intractability means in practice is open to debate. Saying that a problem is not in P does not imply that all large cases of the problem are hard or even that most of them are. For example, the decision problem in Presburger arithmetic has been shown not to be in P, yet algorithms have been written that solve the problem in reasonable times in most cases. Similarly, algorithms can solve the NP-complete knapsack problem over a wide range of sizes in less than quadratic time and SAT solvers routinely handle large instances of the NP-complete Boolean satisfiability problem.