Skip to content

Instantly share code, notes, and snippets.

@matthiasplappert
Last active December 17, 2022 20:24
Show Gist options
  • Save matthiasplappert/7860effd77d533e54797b4bdc277b4cb to your computer and use it in GitHub Desktop.
Save matthiasplappert/7860effd77d533e54797b4bdc277b4cb to your computer and use it in GitHub Desktop.
import numpy as np
import gym
env = gym.make('FetchReach-v0')
obs = env.reset()
done = False
def policy(observation, desired_goal):
# Here you would implement your smarter policy. In this case,
# we just sample random actions.
return env.action_space.sample()
while not done:
action = policy(obs['observation'], obs['desired_goal'])
obs, reward, done, info = env.step(action)
# If we want, we can substitute a goal here and re-compute
# the reward. For instance, we can just pretend that the desired
# goal was what we achieved all along.
substitute_goal = obs['achieved_goal'].copy()
substitute_reward = env.compute_reward(
obs['achieved_goal'], substitute_goal, info)
print('reward is {}, substitute_reward is {}'.format(
reward, substitute_reward))
@astier
Copy link

astier commented Jul 6, 2018

You may want to remove the numpy import because its never used.

@KOHHHHHA
Copy link

apikey

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment