bzier/2018.04.12-MarioKartPresentationNotes.md

## 2018.04.12-MarioKartPresentationNotes.md

      
    Raw
  

              2018.04.12-MarioKartPresentationNotes.md
            
          
    Self-Driving Mario Kart using Reinforcement Learning

Presented on 2018.04.12 by Brian Zier (@bzier)

Outline


Reinforcement Learning (adapted from lecture series; see resouces secion below)

6:23 Reinforcement Learning venn diagram (slide 6)
Reward hypothesis (slide 13):

All goals can be described by the maximisation of expected cumulative reward


Goal: Select actions to maximise total future reward
29:35 Agent and Environment (slide 16)
47:54 Rat Example (slide 22)
57:10 Major Components of an RL Agent (slide 25)

Policy: Map from state -> action
Value: Prediction (expectation) of future reward
Model: Predicts what the environment will do next

Transitions: next state
Reward: next reward


Gym environments & building gym-mupen64plus

Defining the reward function
Progress detection


Dependency / setup challenges

Screenshot offsets / emulator position
XVFB
wxPython


Docker solution

Dependencies (including versioning) explicitly defined in the Dockerfile
Dockerfile committed with repo
docker-compose for container run-time configuration (individual dependent processes, networking, commands, volumes, etc)


Future Work

More/all courses (including random)
Transfer learning from one course to another
Multiplayer (challenging due to current progress detection)
Battle mode (completely different reward function)


Resources


Lecture video series starts here with lecture 1
Lecture slides here
gym-mupen64plus environment repo here
Forked A3C agent here

Get up-and-running

Clone the two repos in the resources section and checkout the appropriate branches:

gym-mupen64plus -> dockerize
universe-starter-agent -> mario-kart-agent

Follow the setup instructions in the README files. If you bump into any issues getting up and running, please reach out to me by filing an issue on the GitHub repository. I usually check every day-ish for new notifications and respond fairly quickly. There may be deficiencies or mistakes in the instructions and I'd like to know so I can address them.