Skip to content

Instantly share code, notes, and snippets.

@Dimagog
Dimagog / pg-pong.py
Last active January 7, 2019 17:13 — forked from karpathy/pg-pong.py
Training XOR gate Neural Network with Policy Gradients
""" Trains an agent with (stochastic) Policy Gradients on Pong. Uses OpenAI Gym. """
import numpy as np
import pickle
# import gym
# hyperparameters
H = 3 # number of hidden layer neurons
batch_size = 20 # every how many episodes to do a param update?
learning_rate = 1e-4
gamma = 0.99 # discount factor for reward