Music Li pkumusic

## Policy-distribution when evaluate
Policy-distribution when evaluate

## epsilon policy when evaluate
epsilon-0.05 policy when evaluate

## Dueling DQN
Dueling DQN

## Double DQN
Double DQN

## DQN
Nature Deep Q-Network

## Linear Double DQN
Linear Double DQN

## Linear DQN
Linear DQN with memory and target network

## Linear DQN
DQN with linear layers with no memory and target network

## A3C with joint density model
A3C with joint density model
4.77 M steps
2500 Score. Very stable for keeping the score!
outperforms A3C+ in paper (142.5)
Speed same as original A3C. Fast. 11iter/s.
Params. Nature, ScheduledHyperParamSetter('learning_rate', [(80, 0.0003), (120, 0.0001)]),
ScheduledHyperParamSetter('entropy_beta', [(80, 0.005)]),
ScheduledHyperParamSetter('explore_factor',
    [(80, 2), (100, 3), (120, 4), (140, 5)]),
Could be improved by testing other params. (Next step)

## O-DDQN-I-S
O-DDQN- Object image swap input
	A3C with joint density model
	4.77 M steps
	2500 Score. Very stable for keeping the score!
	outperforms A3C+ in paper (142.5)
	Speed same as original A3C. Fast. 11iter/s.
	Params. Nature, ScheduledHyperParamSetter('learning_rate', [(80, 0.0003), (120, 0.0001)]),
	ScheduledHyperParamSetter('entropy_beta', [(80, 0.005)]),
	ScheduledHyperParamSetter('explore_factor',
	[(80, 2), (100, 3), (120, 4), (140, 5)]),
	Could be improved by testing other params. (Next step)