Skip to content

Instantly share code, notes, and snippets.

Created April 13, 2015 18:13
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
Star You must be signed in to star a gist
Save kanzure/6950227cae0e0615f373 to your computer and use it in GitHub Desktop.
Prototype: stochastic cooperation in the One-Shot Prisoner's Dilemma in probabilistic programming

(The following is from an email that was received on the mailing list...)

Well, I finally managed to sit down and code the entire prototype. This has been refined a couple times and tested via the Church play-space ( The parameters and set-up can probably use some tweaking to make the cooperation more robust (ie: more probable/numerous as a fraction of the histogram used for testing), but oh well.

The paper follows the Goodman & Tenenbaum approach of "commonsense reasoning as conditional simulation (in probabilistic generative models)". Each agent simulates:

  • The other agent simulating it,
  • Simulating the other agent playing naively.

So the recursion levels are: 0 - Naive play 1 - Observing your opponent's naive play 2 - Observing your opponent's observation of your naive play

Since there are only two moves available in the game, there can only be a total of two levels of Sicilian Reasoning.

Really solving the general problem here is coming up with an inference procedure capable of noticing that the two agents will, each either playing naively or observing the other one's naive play, each make the same move, and then conditioning on that.

We'd like a fully general theory that lets us not need to explicitly condition on the sameness of the two moves to get cooperation. My hypothesis is that if we treated naive play and our opponent's move as two separate random variables and ran a causal induction algorithm on them, we would then come up with a smarter model which stochastically guarantees the sameness, and we could invert that distribution to play wisely -- the general principle being that a wise agent treats its own actions as further inputs to causal modeling and notices when its actions, placed in their context, display a meaningful (ie: utility-affecting) pattern.

אלי סנש‎

; Just to start with, code it as a Schelling coordination game.
; Added correct scores taken from the real OSPD wiki page.
; Added the assumption that the agent maximizes utility via planning-as-inference, bunging the reward through a sigmoid to turn it into a probability.
; Added the assumption that the opponent chooses randomly when depth hits zero.
; Then account for the limited levels of Sicilian Reasoning by running each
; agent with a starting depth of 2.
; Cooperate = true, Defect = false
(define (prisoners-dilemma alice bob)
((and alice bob) '(-1 . -1))
((and (not alice) bob) '(0 . -3))
((and alice (not bob)) '(-3 . 0))
((and (not alice) (not bob)) '(-2 . -2))))
(define (sigmoid x) (/ 1 (+ 1 (exp (- x)))))
(define (reward-distribution r) (flip (sigmoid r)))
(define (naive-prisoners-dilemma)
(let ((opponent-move (flip)))
(define move (flip))
(reward-distribution (car (prisoners-dilemma move opponent-move))))))
(define (smart-prisoners-dilemma depth)
(let ((opponent-move (if (= depth 0) (flip) (smart-prisoners-dilemma (- depth 1)))))
(define move (flip))
(and (equal? move opponent-move) (reward-distribution (car (prisoners-dilemma move opponent-move)))))))
(hist (repeat 100 (lambda () (smart-prisoners-dilemma 2))))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment