View annoying.md

Intro

When using policy gradients (PG) with discrete actions, a softmax distribution is typically used to represent the output distribution. This document describes how the softmax+PG combination can be problematic, even with perfect sampling and a linear model.

Suboptimal asymptotes

Suppose we have a multi-armed bandit problem. We can define our agent as a

View huge_graph.py
"""
Generate some second-derivatives that make TF cry.
On my machine, this takes several minutes to run and uses
about 2GB of memory.
"""
import time
import tensorflow as tf
View main.py
from tensorflow.examples.tutorials.mnist import input_data
data = input_data.read_data_sets('MNIST_data', one_hot=True)
print(len(data.train.labels))
# prints 55000, not 60000 like you'd expect.
View softmax_deriv.py
import numpy as np
import tensorflow as tf
sess = tf.Session()
in_vec = tf.constant(np.array([1, 2, 3], dtype='float32'))
one_hot = tf.constant(np.array([0, 1, 0], dtype='float32'))
in_grad = tf.gradients(tf.nn.softmax_cross_entropy_with_logits(labels=one_hot, logits=in_vec), in_vec)[0]
print(sess.run(in_grad))
print(sess.run(tf.nn.softmax(in_vec) - one_hot))
View deltas.py
import numpy as np
import tensorflow as tf
def main():
var1 = tf.Variable(np.array([[1, 2, 3], [4, 5, 6]]), dtype=tf.float32)
var2 = tf.Variable(np.array([[1, 2, 2], [5, 7, 6]]), dtype=tf.float32)
loss = tf.losses.mean_squared_error(var1, var2)
joined_vars = tf.concat([tf.reshape(x, [-1]) for x in tf.trainable_variables()], axis=0)
joined_backup = tf.Variable(np.zeros([int(x) for x in joined_vars.get_shape()]),
View contest.py
"""
An implementation of "probability contests".
In the most basic probability contest, you have two random
variables X and Y and want to know P(X > Y), i.e. the
probability that X "wins".
It would be desirable if either X or Y always won the
probability contest, even if X = Y.
Thus, we can define the win probability as:
View unfinished_async_roller.py
class AsyncRoller(Roller):
"""
Gather rollouts by running a batch of environments in
parallel on their own copies of a Model.
Each environment is run for a fixed, pre-determined
number of timesteps.
"""
def __init__(self, env_fns, num_timesteps):
View code.go
// cosineTracker is a splitTracker for CosineAlgorithm.
type cosineTracker struct {
mseTracker
}
func (c *cosineTracker) Quality() float64 {
sums := []smallVec{c.sumTracker.leftSum, c.sumTracker.rightSum}
sqSums := []float64{c.leftSquares, c.rightSquares}
counts := []float64{float64(c.leftCount), float64(c.rightCount)}
View main.go
// Test a rolling formula for computing the cosine
// distance between a vector and a repeated mean of chunks
// of that vector.
package main
import (
"fmt"
"math"
"math/rand"
View main.go
// Numerically verify that you should average gradients at
// the leaves in a gradient boosted tree.
package main
import (
"fmt"
"github.com/unixpickle/anydiff"
"github.com/unixpickle/anyvec/anyvec64"