Skip to content

Instantly share code, notes, and snippets.

@ikbendewilliam
Created June 14, 2018 09:16
Show Gist options
  • Save ikbendewilliam/2f5f2913827b06c1f88ac1f09ef90a4b to your computer and use it in GitHub Desktop.
Save ikbendewilliam/2f5f2913827b06c1f88ac1f09ef90a4b to your computer and use it in GitHub Desktop.
Code for the tutorial on medium: Reinforcement learning on Reversing Stones
inputs_units = BOARD_SIZE[0] * BOARD_SIZE[1]
hidden_units = BOARD_SIZE[0] * BOARD_SIZE[1]
output_units = BOARD_SIZE[0] * BOARD_SIZE[1]
def initialise_tf():
global input_positions, labels, learning_rate, W1, b1, h1, W2, b2, logits, probabilities, cross_entropy, train_step
input_positions = tf.placeholder(tf.float32, shape=(1, inputs_units))
labels = tf.placeholder(tf.int64)
learning_rate = tf.placeholder(tf.float32, shape=[])
# Generate hidden layer
W1 = tf.Variable(tf.truncated_normal([inputs_units, hidden_units], stddev=0.1 / inputs_units**0.5))
b1 = tf.Variable(tf.zeros([1, hidden_units]))
h1 = tf.tanh(tf.matmul(input_positions, W1) + b1)
# Second ## -- linear classifier for action logits
W2 = tf.Variable(tf.truncated_normal([hidden_units, output_units], stddev=0.1 / hidden_units**0.5))
b2 = tf.Variable(tf.zeros([1, output_units]))
logits = tf.matmul(h1, W2) + b2
probabilities = tf.nn.softmax(logits)
cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits, labels=labels, name='xentropy')
train_step = tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(cross_entropy)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment