alexkuhl/cs_cheat_sheet.txt

## cs_cheat_sheet.txt
This "cheat sheet" is designed to be a super-high level view of common and/or important computer science concepts. Good for test reviews, preparing for job interviews, brushing up on the basics, becoming a better person, or light reading for putting your significant other to sleep. I will continue to update this so if you have any suggestions or corrections please send a pull request or contact me. Licensed under http://creativecommons.org/licenses/by-sa/3.0/.

Algorithms
 + Limiting behavior (all the second sections are as limit(n)->infinity)
   * Big O - f is bounded by g :: f(n) <= g(n)*c (for some constant c) :: an upper bound, can be much higher than actual performance because this is not a tight bound, for example a log(n) algorithm technically is O(n!) but is obviously way better than something that actually gets n! performance
   * Small or Little o - f dominated by g :: f(n) < g(n)*c; f(n)/g(n) = 0 :: g gets much larger
   * Big Omega - f is bounded below by g :: |f(n)| >= g(n)*c :: a lower bound, but similary not "tight" like Big O
   * Small or Little Omega - f dominates g :: f(n) > g(n)*c
   * Big Theta - f is bounded above and below by g :: g(n)*c1 <= f(n) <= g(n)*c2
   * ~ (order of) - f is equal to g :: |f(n) - g(n)*c| < epsilon

 + Algorithm Choice
   * step-by-step instructions -> straight coding
   * Breadth-first search (BFS) -> ask for # of steps or a shortest path with traits like only certain ways to change state, uniform cost, given an environment with valid and invalid states
   * Flood Fill -> similar to BFS but no path asked for, ex. find all reachable areas of a maze from some starting location
   * Brute Force or Backtracking -> find every possible configuration (subset) of items with respect to given rules, possibly return best configuration

Data Structures
 + Heap - tree where children are <= (max heap) or >= (min heap) parent
 + Binary Heap - a binary tree with the added constraints of completeness (all levels except bottom are completely full, filled left to right) and children satisfy the constraints of a heap
 + Binary Tree - each node can only have two children, doesn't have to be sorted, complete, or balanced in any way
 + Binary Search Tree - a binary tree where all nodes have a distinct value and the left child is always smaller than the parent and the right only values greater. A result of this is all subtrees are also BSTs. Some BSTs allow duplicate values, others do not. Again like a BT these are not necessarily balanced or complete.

Databases
 + Normal Forms
   * 1st normal
     - No top-to-bottom ordering to the rows.
     - No left-to-right ordering to the columns.
     - No duplicate rows.
     - Every row-and-column intersection contains exactly one value from the applicable domain (and nothing else).
     - All columns are regular [i.e. rows have no hidden components such as row IDs, object IDs, or hidden timestamps].
     - All column entries must be same length and one value (so tel # can't have two numbers in it)
     - Data must be from applicable domain (tel #'s are 12 characters long for example)
     - Can't have multiple groups for a domain (can't have multiple tel # columns)
     - No nulls
   * 2nd normal - 1NF + rows must be dependent on whole candidate (re: representative) key
   * 3rd normal - Every non-key attribute "must provide a fact about the key, the whole key, and nothing but the key" -> the information must be relevant directly to the key, not transitively through another column
 + Joins
   * inner: Only rows that match the WHERE
   * left:  Include all rows from left table, when there is not a match with the right table its corresponding columns are NULL
   * right: Opposite of left
   * full:  All rows returned, those columns where both tables do not match have NULL for the non-matching table's columns
 + Keys
   * primary: Unique identifier in a table
   * foreign: Primary key from another table

Debugging
 + Trace - console-style output statements
 + Debugger - use a tool to follow program flow line-by-line and examine memory contents

Testing
 + black box - no internal implementation knowledge
 + white box - opposite, have implementation available or known
 + gray box - have the knowledge, but test as if at black box level
 + unit testing - test at modules level; classes in OOP for example
 + performance testing
 + user studies
 + regression - run previously passed tests to make sure new changes have not altered unexpected parts of program
 + known output comparison

Object Oriented Programming (OOP)
 + abstraction - model classes appropriate to how they are used; hiding of unnecessary details; work at the appropriate inheritance level, e.g. sometimes appropriate to use object as an Vehicle, at others a Car
 + composition - related to abstraction, where one object type contains other object data types whose implementations do not necessarily have to be known, only how to interface with them, e.g. a Car would contain Engine, Transmission, etc.
 + polymorphism - methods or operators performing differently based on object type, e.g. Vehicle's start( ) will cause a Car to turnIgnitionKey( ) whereas a GoKart would pullString( ) (assuming a lawn mower-like engine). The classes that inherit from Vehicle override the methods of the parent they inherit from. Another example are arithmetic operators, which we expect to work for ints, doubles, floats, etc.
 + encapsulation - data and method hiding from the public interface; no one sees the implementation they only know how to interface with the class

Miscellaneous
 + Difference between a process and a thread
   * Process - a "program" that has a state (scheduling, data, stack, etc.) within the OS that isn't shared, processes can only communicate through interprocess communication facilities
   * Thread - threads are typically spawned by a process and multiple threads can be attached to one process, they share the process's data/address space and can interact with other threads from the same process. This sharing of "global" information makes it easier to threads to work on the same data than processes. Also, creating a thread is less expensive than creating a new process (through fork or whatever).
	This "cheat sheet" is designed to be a super-high level view of common and/or important computer science concepts. Good for test reviews, preparing for job interviews, brushing up on the basics, becoming a better person, or light reading for putting your significant other to sleep. I will continue to update this so if you have any suggestions or corrections please send a pull request or contact me. Licensed under http://creativecommons.org/licenses/by-sa/3.0/.

	Algorithms
	+ Limiting behavior (all the second sections are as limit(n)->infinity)
	* Big O - f is bounded by g :: f(n) <= g(n)*c (for some constant c) :: an upper bound, can be much higher than actual performance because this is not a tight bound, for example a log(n) algorithm technically is O(n!) but is obviously way better than something that actually gets n! performance
	* Small or Little o - f dominated by g :: f(n) < g(n)*c; f(n)/g(n) = 0 :: g gets much larger
	* Big Omega - f is bounded below by g :: \|f(n)\| >= g(n)*c :: a lower bound, but similary not "tight" like Big O
	* Small or Little Omega - f dominates g :: f(n) > g(n)*c
	* Big Theta - f is bounded above and below by g :: g(n)c1 <= f(n) <= g(n)c2
	* ~ (order of) - f is equal to g :: \|f(n) - g(n)*c\| < epsilon

	+ Algorithm Choice
	* step-by-step instructions -> straight coding
	* Breadth-first search (BFS) -> ask for # of steps or a shortest path with traits like only certain ways to change state, uniform cost, given an environment with valid and invalid states
	* Flood Fill -> similar to BFS but no path asked for, ex. find all reachable areas of a maze from some starting location
	* Brute Force or Backtracking -> find every possible configuration (subset) of items with respect to given rules, possibly return best configuration

	Data Structures
	+ Heap - tree where children are <= (max heap) or >= (min heap) parent
	+ Binary Heap - a binary tree with the added constraints of completeness (all levels except bottom are completely full, filled left to right) and children satisfy the constraints of a heap
	+ Binary Tree - each node can only have two children, doesn't have to be sorted, complete, or balanced in any way
	+ Binary Search Tree - a binary tree where all nodes have a distinct value and the left child is always smaller than the parent and the right only values greater. A result of this is all subtrees are also BSTs. Some BSTs allow duplicate values, others do not. Again like a BT these are not necessarily balanced or complete.

	Databases
	+ Normal Forms
	* 1st normal
	- No top-to-bottom ordering to the rows.
	- No left-to-right ordering to the columns.
	- No duplicate rows.
	- Every row-and-column intersection contains exactly one value from the applicable domain (and nothing else).
	- All columns are regular [i.e. rows have no hidden components such as row IDs, object IDs, or hidden timestamps].
	- All column entries must be same length and one value (so tel # can't have two numbers in it)
	- Data must be from applicable domain (tel #'s are 12 characters long for example)
	- Can't have multiple groups for a domain (can't have multiple tel # columns)
	- No nulls
	* 2nd normal - 1NF + rows must be dependent on whole candidate (re: representative) key
	* 3rd normal - Every non-key attribute "must provide a fact about the key, the whole key, and nothing but the key" -> the information must be relevant directly to the key, not transitively through another column
	+ Joins
	* inner: Only rows that match the WHERE
	* left: Include all rows from left table, when there is not a match with the right table its corresponding columns are NULL
	* right: Opposite of left
	* full: All rows returned, those columns where both tables do not match have NULL for the non-matching table's columns
	+ Keys
	* primary: Unique identifier in a table
	* foreign: Primary key from another table

	Debugging
	+ Trace - console-style output statements
	+ Debugger - use a tool to follow program flow line-by-line and examine memory contents

	Testing
	+ black box - no internal implementation knowledge
	+ white box - opposite, have implementation available or known
	+ gray box - have the knowledge, but test as if at black box level
	+ unit testing - test at modules level; classes in OOP for example
	+ performance testing
	+ user studies
	+ regression - run previously passed tests to make sure new changes have not altered unexpected parts of program
	+ known output comparison

	Object Oriented Programming (OOP)
	+ abstraction - model classes appropriate to how they are used; hiding of unnecessary details; work at the appropriate inheritance level, e.g. sometimes appropriate to use object as an Vehicle, at others a Car
	+ composition - related to abstraction, where one object type contains other object data types whose implementations do not necessarily have to be known, only how to interface with them, e.g. a Car would contain Engine, Transmission, etc.
	+ polymorphism - methods or operators performing differently based on object type, e.g. Vehicle's start( ) will cause a Car to turnIgnitionKey( ) whereas a GoKart would pullString( ) (assuming a lawn mower-like engine). The classes that inherit from Vehicle override the methods of the parent they inherit from. Another example are arithmetic operators, which we expect to work for ints, doubles, floats, etc.
	+ encapsulation - data and method hiding from the public interface; no one sees the implementation they only know how to interface with the class

	Miscellaneous
	+ Difference between a process and a thread
	* Process - a "program" that has a state (scheduling, data, stack, etc.) within the OS that isn't shared, processes can only communicate through interprocess communication facilities
	* Thread - threads are typically spawned by a process and multiple threads can be attached to one process, they share the process's data/address space and can interact with other threads from the same process. This sharing of "global" information makes it easier to threads to work on the same data than processes. Also, creating a thread is less expensive than creating a new process (through fork or whatever).