Skip to content

Instantly share code, notes, and snippets.

@zcaceres
Last active April 1, 2017 19:38
Show Gist options
  • Save zcaceres/dcccedaf95ba3159cf611d840c3b3dc7 to your computer and use it in GitHub Desktop.
Save zcaceres/dcccedaf95ba3159cf611d840c3b3dc7 to your computer and use it in GitHub Desktop.
FSA 1702 – April 1, 2017 CS Saturday

Machine Learning: CS-Saturday

FSA 1702 – April 1, 2017


Competitive Search and Agents for Games – Minimax

Let's make agents to play deterministic zero-sum game.

Monte Carlo tree search is a non-deterministic extension of this approach.

Deterministic games have 'no excuses' (dice rolls, cards drawn etc). Only the players' moves determine the outcome of the game.

Game Trees and Zero-Sum

Game Tree

Vertices connected by edges. The vertex is a board state – the root node does not need to be the initial state.

'Edges' connect board states.

Players alternate is choosing which branches you go down. Edges don't branch in a deterministic game, but do in non-deterministic games. These are 'trees'.

The tree reaches all the way down until the state can no longer be changed (win/lose/draw state).

Zero-Sum Game

  • One player is always exactly balanced by the loss of another. Chess/Checkers/Connect 4

You can track how a game is going with a number. We can model one player as having to maximize a number and another as trying to minimize it.

Game victories to red, for example, have a value of 100. Game victories for blue have a value of -100. All draws have a value of 0. Red wants to maximize. Blue wants to minimize.

Non-Useful Algorithm

  • Function receives a game node
  • Evaluates how good that state is
  • Returns the move that leads to the state evaluated to be the best.

This transforms our problem. We take the question of 'how do we make a move' to 'how do we determine the best possible move?'

Minimax is how we are going to evaluate our moves. It's out evaluation function. This is a way of evaluating states. Minimax is recursive, it returns a value for a board state. Returns a positive value if its better for maximizing or negative value if better for minimizing player.

Terminal nodes are our final states (a victory or not). The value of a node is the value of the most valuable child node (where terminal nodes have the highest/lowest value according to whether the player is maximizing or minimizing).

This can be applied to trees of arbitrary length.

The minimizer's move is aimed at the 'least valuable' child states. The maximizer's move is aimed at the 'most valuable' child states. We bubble up the values to determine our 'upstream' node values.

Minimax the algo:

  • takes an input node (state), returns value of the state
  • if game is over return 100 -100 or 0
  • if current mover is maximizing, value of state is value of most valuable child state

Branching Factor

b^d number of states Where b is breadth and d is depth.

Smaller branching factor:

  • Connect 4/Chess

Larger branching factor:

  • Go

We have to decrease either of B or D in our equation. Either we decrease the depth (how far we search), heuristic evalution.

If we want to decrease the breadth of the game tree, we use alpha-beta pruning.

Heuristic Evaluation

A heuristic function has the same input and output as input.

Estimates how valuable a state is without looking ahead. Estimates WITHOUT looking ahead -- a basic rule of thumb.

  • how many 2s or 3s in a row one has, while weighing 3s more heavily
  • look at how many pieces towards the middle
  • also factor in what the opponent has

Heuristics:

  • involves domain knowledge
  • handcrafted sometimes (Deep Blue)
  • sometimes generated by machine learning (AlphaGo)

Alpha-Beta Pruning

Decreasing the breadth of our game tree.

Let's ignore parts of the game tree that are irrelevant. Basically: let's just not evaluate a possibility because it is worse than something you know you could get.

Based on intermediate (non-terminal) node values, we can determine a move by keeping track of the next node value. We can then anticipate what the next player will do within bounds -- because my rival will choose a max or min value. So a maximizing player, for example, will not be choosing a value BELOW whatever value is best for them in the next turn -- we don't need to evaluate trees that we know have lower values, because our maximizer player is never going to choose that one.

  • Keep track of the highest score that the maximizing player can guarantee
  • Keep track of the lowest score the minimizing player can guarantee
  • if you are maximizing, and the alpha rises above the beta as you search through successor states then nothing more need be examined from this node
  • if you are minimizing player, and the beta drops beneath the alpha as you search through successor states then nothing more need be examined from this node

Basically: discount nodes that you already know are worse for the relevant player if you have another node with a better value!

Machine Learning

Program --> Function ^ infinite bugs

Program --> Learn From Data or Experience --> Use Program ^ infinite bugs ^ infinite tinkering with ways to learn and experimentation

Classifications

  • Supervised Learning
  • Unsupervised Learning
  • Reinforcement Learning

Supervised Learning

Learning from data where you are given examples of an input and a correct output.

You then learn to map from a previously unseen input to the correct output

  • Most commercial applications use supervised learning
  • Most highly developed right now

Examples:

  • Image classification problems (ImageNet)
  • Move prediction problems (AlphaGo)
  • Stock Market Regression (everybody, everywhere in NYC)

Algorithms

  • K-Nearest Neighbors
  • Feedforward Neural Networks
  • Recurrent Neural Networks
  • Support Vector Machines
  • Decision Trees

Unsupervised Learning

Learning from data where you are not given an input

Examples:

  • Dimensionality Reduction (Eigenfaces)
  • Cluster Analysis (classifying shopping behavior)
  • Building a Probabilistic Model (Voynich Manuscript)

Algos:

  • K Mean Clustering
  • Neural Network Autoencoders
  • Principle Component Analysis

Reinforcement Learning

Learning how to take actions in an environment to maximize a reward

Like minimax, but learns from experience.

Examples:

  • Playing Atari (google deepmind)
  • Playing Go (AlphaGo)
  • Learning-to-Learn Models (hybrid supervised/reinforcement learning modules)

Algorithms: Often adapted from super/unsupervised learning

Hofstadter said if we could recognize characters reliably, we could have intelligence more or less figured out!

K-Nearest Neighbors

We can talk about the size of an ML input as dimensionality. In the case of our character list, it's 784 (our greyscale 28x28 image).

Nearest Neighbor If given a new point, let's find the classification that is closest to that point. This creates a decision boundary – spaces where a point can 'land' and be classified according to its nearest neighbor.

K - Nearest Neighbors Instead of looking only at the nearest neighbor. We can look at several (k) nearest neighbors to determine its classification.

Distance in N Dimensions

Distance makes intuitive sense in a 2D space.

But what if we are thinking of arrays of much longer length?

Distance between points: sqrt((x1 - x2))^2 + (y1 - y2)^2)

Distance between points in arbitrary Cartesian space: Same idea but for ALL points: (...+(z1 -z2)^2)

Problems

  • Ignores global dimensions (looks at local features but does not necessarily understand features in an abstract way)
  • Training (storing data) is fast and classification is expensive (comparing to every point in the data set). ML works best with lots of data, so we will have to use a huge amount of comparisons...
  • Curse of dimensionality

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment