Skip to content

Instantly share code, notes, and snippets.

@MaximeBouton
Created January 31, 2019 05:40
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save MaximeBouton/b040a4f09ec779dc73f448e3d2e09da5 to your computer and use it in GitHub Desktop.
Save MaximeBouton/b040a4f09ec779dc73f448e3d2e09da5 to your computer and use it in GitHub Desktop.
Example of discrepancies between SparseVI and VI when handling terminal states
using POMDPs
using POMDPModelTools
using DiscreteValueIteration
using Parameters
@with_kw struct TwoStatesMDP <: MDP{Int, Int}
γ::Float64 = 0.95
end
POMDPs.n_states(mdp::TwoStatesMDP) = 2
POMDPs.states(mdp::TwoStatesMDP) = 1:2
POMDPs.stateindex(mdp::TwoStatesMDP, s) = s
POMDPs.n_actions(mdp::TwoStatesMDP) = 2
POMDPs.actions(mdp::TwoStatesMDP) = 1:2
POMDPs.actionindex(mdp::TwoStatesMDP, a) = a
POMDPs.discount(mdp::TwoStatesMDP) = mdp.γ
function POMDPs.transition(mdp::TwoStatesMDP, s, a)
return SparseCat([a], [1.0])
end
function POMDPs.reward(mdp::TwoStatesMDP, s, a, sp)
return float(sp == 2)
end
POMDPs.isterminal(mdp::TwoStatesMDP, s) = s == 2
mdp = TwoStatesMDP()
solver = ValueIterationSolver(verbose = true)
policy = solve(solver, mdp)
println(policy.qmat)
sparsesolver = SparseValueIterationSolver(verbose=true)
policy = solve(sparsesolver, mdp)
println(policy.qmat)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment