Skip to content

Instantly share code, notes, and snippets.

View Nov05's full-sized avatar
💭
Homo Sapiens

Nov05

💭
Homo Sapiens
View GitHub Profile
@Nov05
Nov05 / 20240322_reinforcement learning_neural network soft update.md
Last active March 22, 2024 12:22
20240322_reinforcement learning_neural network soft update

"deeprl/agent/DDPG_agent.py"

  • trg = trg*(1-τ) + src*τ
  • τ is stored in self.config.target_network_mix
    def soft_update(self, target, source):
        ## trg = trg*(1-τ) + src*τ
        ## τ is stored in self.config.target_network_mix
        for target_param, source_param in zip(target.parameters(), source.parameters()):
 target_param.detach_()

👉 Udacity Deep Reinforcement Learning Python Environment Setup

⚠️ Python 3.11 has to be downgraded to Python 3.10, or Multiprocessing will cause TypeError: code() argument 13 must be str, not int in both Windows and Linux. Google Colab is currently using Python 3.10 as well.


(drlnd_p2) PS D:\github\udacity-deep-reinforcement-learning\python\mujoco-py> python examples\body_interaction.py

You appear to be missing MuJoCo.  We expected to find the file here: C:\Users\*\.mujoco\mujoco210

This package only provides python bindings, the library must be installed separately.

Please follow the instructions on the README to install MuJoCo

⚠️ issue: from gym.wrappers import Monitor caused ImportError: cannot import name 'Monitor' from 'gym.wrappers'.

  • solution (2022'):
    from gym.wrappers.record_video import RecordVideo
    env = gym.make('CartPole-v1', render_mode="rgb_array")
    env = RecordVideo(env, './video',  episode_trigger = lambda episode_number: True)
    env.reset()
    

20240218_pong-PPO.ipynb
👉 training log for reference
1000 episodes, T4 GPU, Wall time: 1h 38min 14s

Episode: 20, score: -15.750000
[-16. -16. -16. -16. -16. -16. -16. -14.]
Episode: 40, score: -12.625000
@Nov05
Nov05 / 20240218_reinforcement learning_pong training log 1200e.md
Created February 19, 2024 06:00
20240218_reinforcement learning_pong training log 1200e

20240217_pong_REINFORCE.ipynb
👉 training log for reference
1200 episodes on T4 GPU, Wall time: 2h 12min 12s

Episode: 20, score: -14.500000
[-14. -15. -16. -13. -14. -16. -16. -12.]
Episode: 40, score: -14.500000
@Nov05
Nov05 / 20240218_reinforcement learning_pong training log for reference.md
Last active February 19, 2024 04:05
20240218_reinforcement learning_pong training log for reference

20240217_pong_REINFORCE.ipynb
👉 training log for reference
800 episodes on T4 GPU, Wall time: 1h 17min 44s

Episode: 20, score: -14.000000
[-15. -17. -15. -14. -13. -13. -16.  -9.]
@Nov05
Nov05 / 20240218_python_PyWhatKit_issue_313.md
Created February 19, 2024 00:21
20240218 python PyWhatKit issue 313

Ankit404butfound/PyWhatKit#313

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
/usr/local/lib/python3.10/dist-packages/Xlib/support/unix_connect.py in get_socket(dname, host, dno)
     75             s = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
---> 76             s.connect('/tmp/.X11-unix/X%d' % dno)
     77     except OSError as val:
@Nov05
Nov05 / 20240215_udacity reinforcement learning_DQN project submission.md
Last active February 15, 2024 17:32
👉 Unity ML-Agents `Banana Collectors` Project Submission

👉 Unity ML-Agents Banana Collectors Project Submission

  1. For this toy game, two Deep Q-network methods are tried out. Since the observations (states) are simple (not in pixels), convolutional layers are not in use. And the evaluation results confirm that linear layers are sufficient for solving the problem.
    • Double DQN, with 3 linear layers (hidden dims: 256*64, later tried with 64*64)
    • Dueling DQN, with 2 linear layers + 2 split linear layers (hidden dims: 64*64)

▪️ The Dueling DQN architecture is displayed as below.