MadcowD/Multi Concept Reinforcement Learning.md

## Multi Concept Reinforcement Learning.md

      
    Raw
  

              Multi Concept Reinforcement Learning.md
            
          
    Bonsai Multi Concept Reinforcement Learning: Continuous Lunar Lander

The algorithm depicted was programmed in inkling, a meta-level programming language developed by Bons.ai (https://bons.ai/). The following is the program
which, when compiled to neural networks, solved the environment.
simulator lunarlander_simulator (LunarLanderConfig) 
   send schema  (GameState)
end
schema GameState
    Float32 x_position,
    Float32 y_position,
    Float32 x_velocity,
    Float32 y_velocity,
    Float32 angle,
    Float32 rotation,
    Float32 left_leg,
    Float32 right_leg
end

schema LanderAction
    Int8{0, 1, 2, 3} action
end

schema LunarLanderConfig
    Int8 episode_length,
    Int8 num_episodes,
    Int8 deque_size
end

concept stay_stable is classifier
    predicts (LanderAction)
    follows input(GameState)

end
concept go_center is classifier
    predicts (LanderAction)
    follows input(GameState)

end
concept land is classifier
    predicts (LanderAction)
    follows stay_stable, go_center, input(GameState)
    feeds output
end

curriculum stable_curriculum
    train stay_stable
    with simulator lunarlander_simulator
    objective stable_objective

        lesson stable
            configure
                constrain episode_length with Int8{-1},
                constrain num_episodes with Int8{-1},
                constrain deque_size with UInt8{1}
            until
                maximize stable_objective
end

curriculum center_curriculum
    train go_center
    with simulator lunarlander_simulator
    objective center_objective

        lesson center
            configure
                constrain episode_length with Int8{-1},
                constrain num_episodes with Int8{-1},
                constrain deque_size with UInt8{1}
            until
                maximize center_objective
end

curriculum landing_curriculum
    train land
    with simulator lunarlander_simulator
    objective landing_objective

        lesson landing
            configure
                constrain episode_length with Int8{-1},
                constrain num_episodes with Int8{-1},
                constrain deque_size with UInt8{1}
            until
                maximize landing_objective

end

Finally, besides the standard reward functions, we used center_objective and stable objective which gave Gaussian rewards (standard deviation was 0.09)
for being at angle Theta = 0 and position x = 0.