WuXinyang2012/records.md

## records.md

      
    Raw
  

              records.md
            
          
    Records of drl-for-safety projects.

1, Background Kownledge.

1.1 Learn the basic DRL concepts and popular algorithms.

1.2 Focus more on DDPG+HER algorithm.
2, Compare mujoco-py and dm_control.

Result: Choose mujoco-py as our framework.

Reason:

1.) mujoco-py has more active community: the number of open issues ~120, while dm_control ~10.
2.) mujoco-py can support contacts information.
Instead of lumping all simulation parameters into one "world", mujoco separates them into two data structures (C struct) at runtime: mjModel (model description and visualization options) and mjData (dynamic variables and intermediate results).
According to mujoco documentation, the contact information is stored in mjData.contact variable at runtime (including contact position, distance), and mujoco-py provides python binding MjSim.data.contact to access it.
In mujoco-py, the environment and simulation is stored in class mujoco_py.MjSim (mujoco-py MjSim documentation).
The rumtime information, including contacts, can be accessed via variable MjSim.data, which is of class mujoco_py.PyMjData (mujoco-py PyMjData documentation).
The contact force can be accessed via mujoco internal function:
c_array = np.zeros(6, dtype=np.float64)
mujoco_py.functions.mj_contactForce(sim.model, sim.data, i, c_array)
# Convert the contact force from contact frame to world frame.
print('contact frame:', contact.frame)
ref = np.reshape(contact.frame, (3,3))
c_force = np.dot(np.linalg.inv(ref), c_array[0:3])
c_torque = np.dot(np.linalg.inv(ref), c_array[3:6])
print('contact force in world frame:', c_force)
print('contact torque in world frame:', c_torque)
The corresponding data and contact definition in mujoco documentaion:

mujoco mjData API

mujoco mjContact API
3.)Working with gym, we can start our project based on "FetchReach" environment and HER implementations.
An example code for extracting contacts.

Future Target:

1.) Configure the robot arm as geom (in mujoco, only geom element supports contact force, and in "FetchReach" environment).
2.) Implement collision checker with the contact information (Positive contact force represents one collision.)
3.) Make the geom obstacle(the black box shown in example) dynamic.
3, Train with collision checker.

The collision checker has been implemented to give -10 reward when collision is detected. However, the HER algorithm will substitute the target goal with the achieved goal and thus the collision point will be considered as an achieved target, and the -10 reward will be substituted with 0.
Therefore, before training with HER:

1.) The first experiment will be training with merely DDPG in obstacled env, without -10 rewards.
After 10e7 timesteps, the DDPG with mlp and conv network struture still does not converge.
2.) The second experiment will be training with merely DDPG in obstacled env, with -10 rewards.
After 10e7 timesteps, the DDPG with mlp and conv network struture still does not converge.
3.) The last experiment will be training with DDPG+HER in obstacled env, with -10 rewards.
Updates: Waiting commits for the HER with collision rewards.
The robotic environments have dictionary-based observation space.

For the dictionary-based observation supports.
Some remaining problems:

1.) The obstacle needs to be dynamic.

2.) There are noisy collision in env, e.g. the robot arm collides the table or the robot body.

3.) The HER seems cannot take consideration into the third -10 rewards.