Skip to content

Instantly share code, notes, and snippets.

@bamos
Created April 7, 2018 19:17
Show Gist options
  • Star 4 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save bamos/cedcc0165f2993f524fd839357b359cc to your computer and use it in GitHub Desktop.
Save bamos/cedcc0165f2993f524fd839357b359cc to your computer and use it in GitHub Desktop.
#!/usr/bin/env python3
import numpy as np
import gym
from baselines.common.vec_env.subproc_vec_env import SubprocVecEnv
env_name = 'Pendulum-v0'
nproc = 8
T = 10
def make_env(env_id, seed):
def _f():
env = gym.make(env_id)
env.seed(seed)
return env
return _f
envs = [make_env(env_name, seed) for seed in range(nproc)]
envs = SubprocVecEnv(envs)
xt = envs.reset()
for t in range(T):
ut = np.stack([envs.action_space.sample() for _ in range(nproc)])
xtp1, rt, done, info = envs.step(ut)
@CesMak
Copy link

CesMak commented May 24, 2020

Hey bamos this is very nice.

However I struggle using it in my RL PPO algorithm. Cause using this I also have to parallelize my policy.act(state, memory) method... to be policy.act(states, memoys)

Here is my code snippet:

def playSteps(policy, env, steps):
    done            = 0
    xt              = envs.reset()
    memory          = [Memory()]*12
    for _ in range(100):
        # note reset is done inside the gym environment!
        ut = []
        for i in range(12):
            ut.append(policy.act([xt[i], memory[i]]))   #ut = np.stack([envs.action_space.sample() for _ in range(nproc)])
        state, rewards, done, info = envs.step(ut)

How did you solve these kind of problem?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment