Hans Bouwmeester HansBouwmeester

## ddpg_gym.py
"""
Implementation of DDPG - Deep Deterministic Policy Gradient
Algorithm and hyperparameter details can be found here: http://arxiv.org/pdf/1509.02971v2.pdf
Variance scaling paper: https://arxiv.org/pdf/1502.01852v1.pdf
Thanks to GitHub users yanpanlau, pemami4911, songrotek and JunhongXu for their DDPG examples

Batch normalisation on the actor accelerates learning but has poor long term stability. Applying to the critic breaks
it, particularly on the state branch. Not sure why but I think this issue is specific to this environment
"""
import numpy as np

## Q-Table Learning-Clean.ipynb

      
        
          
            
              
              1 file
            
          
          
            
              
              0 forks
            
          
          
            
              
              0 comments
            
          
          
            
              
              0 stars
            
          
        
        
          
              
          
          
            
                HansBouwmeester
                / Q-Table Learning-Clean.ipynb
            
            
              Created
              May 1, 2017 00:28
                — forked from awjuliani/Q-Table Learning-Clean.ipynb
            
              
                Q-Table learning in OpenAI grid world.
              
          
        
      
        
  
    
    

          
    
        Loading

      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
	"""
	Implementation of DDPG - Deep Deterministic Policy Gradient
	Algorithm and hyperparameter details can be found here: http://arxiv.org/pdf/1509.02971v2.pdf
	Variance scaling paper: https://arxiv.org/pdf/1502.01852v1.pdf
	Thanks to GitHub users yanpanlau, pemami4911, songrotek and JunhongXu for their DDPG examples

	Batch normalisation on the actor accelerates learning but has poor long term stability. Applying to the critic breaks
	it, particularly on the state branch. Not sure why but I think this issue is specific to this environment
	"""
	import numpy as np