Navigation Menu

Skip to content

Instantly share code, notes, and snippets.

@MadcowD
Last active September 26, 2016 16:29
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save MadcowD/ddc99c2ba80c6a4f3f44dd6442a1cea9 to your computer and use it in GitHub Desktop.
Save MadcowD/ddc99c2ba80c6a4f3f44dd6442a1cea9 to your computer and use it in GitHub Desktop.
OpenAI Gym: Continuous Lunar Lander

Bonsai Multi Concept Reinforcement Learning: Continuous Lunar Lander

The algorithm depicted was programmed in inkling, a meta-level programming language developed by Bons.ai (https://bons.ai/). The following is the program which, when compiled to neural networks, solved the environment.

simulator lunarlander_simulator (LunarLanderConfig) 
   send schema  (GameState)
end
schema GameState
    Float32 x_position,
    Float32 y_position,
    Float32 x_velocity,
    Float32 y_velocity,
    Float32 angle,
    Float32 rotation,
    Float32 left_leg,
    Float32 right_leg
end

schema LanderAction
    Int8{0, 1, 2, 3} action
end

schema LunarLanderConfig
    Int8 episode_length,
    Int8 num_episodes,
    Int8 deque_size
end

concept stay_stable is classifier
    predicts (LanderAction)
    follows input(GameState)

end
concept go_center is classifier
    predicts (LanderAction)
    follows input(GameState)

end
concept land is classifier
    predicts (LanderAction)
    follows stay_stable, go_center, input(GameState)
    feeds output
end

curriculum stable_curriculum
    train stay_stable
    with simulator lunarlander_simulator
    objective stable_objective

        lesson stable
            configure
                constrain episode_length with Int8{-1},
                constrain num_episodes with Int8{-1},
                constrain deque_size with UInt8{1}
            until
                maximize stable_objective
end

curriculum center_curriculum
    train go_center
    with simulator lunarlander_simulator
    objective center_objective

        lesson center
            configure
                constrain episode_length with Int8{-1},
                constrain num_episodes with Int8{-1},
                constrain deque_size with UInt8{1}
            until
                maximize center_objective
end

curriculum landing_curriculum
    train land
    with simulator lunarlander_simulator
    objective landing_objective

        lesson landing
            configure
                constrain episode_length with Int8{-1},
                constrain num_episodes with Int8{-1},
                constrain deque_size with UInt8{1}
            until
                maximize landing_objective

end

Finally, besides the standard reward functions, we used center_objective and stable objective which gave Gaussian rewards (standard deviation was 0.09) for being at angle Theta = 0 and position x = 0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment