kanzure/README.md

## README.md

      
    Raw
  

              README.md
            
          
    (The following is from an email that was received on the decision-theory-workshop@googlegroups.com mailing list...)
Well, I finally managed to sit down and code the entire prototype.  This has been refined a couple times and tested via the Church play-space (https://probmods.org/play-space.html).  The parameters and set-up can probably use some tweaking to make the cooperation more robust (ie: more probable/numerous as a fraction of the histogram used for testing), but oh well.
The paper follows the Goodman & Tenenbaum approach of "commonsense reasoning as conditional simulation (in probabilistic generative models)".  Each agent simulates:

The other agent simulating it,
Simulating the other agent playing naively.

So the recursion levels are:
0 - Naive play
1 - Observing your opponent's naive play
2 - Observing your opponent's observation of your naive play
Since there are only two moves available in the game, there can only be a total of two levels of Sicilian Reasoning.
Really solving the general problem here is coming up with an inference procedure capable of noticing that the two agents will, each either playing naively or observing the other one's naive play, each make the same move, and then conditioning on that.
We'd like a fully general theory that lets us not need to explicitly condition on the sameness of the two moves to get cooperation.  My hypothesis is that if we treated naive play and our opponent's move as two separate random variables and ran a causal induction algorithm on them, we would then come up with a smarter model which stochastically guarantees the sameness, and we could invert that distribution to play wisely -- the general principle being that a wise agent treats its own actions as further inputs to causal modeling and notices when its actions, placed in their context, display a meaningful (ie: utility-affecting) pattern.
אלי סנש‎ esennesh@cs.technion.ac.il

  
## ospd.scm
; Just to start with, code it as a Schelling coordination game.
; Added correct scores taken from the real OSPD wiki page.

; Added the assumption that the agent maximizes utility via planning-as-inference, bunging the reward through a sigmoid to turn it into a probability.
; Added the assumption that the opponent chooses randomly when depth hits zero.
;
; Then account for the limited levels of Sicilian Reasoning by running each
;   agent with a starting depth of 2.


; Cooperate = true, Defect = false
(define (prisoners-dilemma alice bob)
  (cond
    ((and alice bob) '(-1 . -1))
    ((and (not alice) bob) '(0 . -3))
    ((and alice (not bob)) '(-3 . 0))
    ((and (not alice) (not bob)) '(-2 . -2))))

(define (sigmoid x) (/ 1 (+ 1 (exp (- x)))))
(define (reward-distribution r) (flip (sigmoid r)))

(define (naive-prisoners-dilemma)
  (let ((opponent-move (flip)))
    (rejection-query
      (define move (flip))
      move
      (reward-distribution (car (prisoners-dilemma move opponent-move))))))

(define (smart-prisoners-dilemma depth)
  (let ((opponent-move (if (= depth 0) (flip) (smart-prisoners-dilemma (- depth 1)))))
    (rejection-query
      (define move (flip))
      move
      (and (equal? move opponent-move) (reward-distribution (car (prisoners-dilemma move opponent-move)))))))

(hist (repeat 100 (lambda () (smart-prisoners-dilemma 2))))
	; Just to start with, code it as a Schelling coordination game.
	; Added correct scores taken from the real OSPD wiki page.

	; Added the assumption that the agent maximizes utility via planning-as-inference, bunging the reward through a sigmoid to turn it into a probability.
	; Added the assumption that the opponent chooses randomly when depth hits zero.
	;
	; Then account for the limited levels of Sicilian Reasoning by running each
	; agent with a starting depth of 2.


	; Cooperate = true, Defect = false
	(define (prisoners-dilemma alice bob)
	(cond
	((and alice bob) '(-1 . -1))
	((and (not alice) bob) '(0 . -3))
	((and alice (not bob)) '(-3 . 0))
	((and (not alice) (not bob)) '(-2 . -2))))

	(define (sigmoid x) (/ 1 (+ 1 (exp (- x)))))
	(define (reward-distribution r) (flip (sigmoid r)))

	(define (naive-prisoners-dilemma)
	(let ((opponent-move (flip)))
	(rejection-query
	(define move (flip))
	move
	(reward-distribution (car (prisoners-dilemma move opponent-move))))))

	(define (smart-prisoners-dilemma depth)
	(let ((opponent-move (if (= depth 0) (flip) (smart-prisoners-dilemma (- depth 1)))))
	(rejection-query
	(define move (flip))
	move
	(and (equal? move opponent-move) (reward-distribution (car (prisoners-dilemma move opponent-move)))))))

	(hist (repeat 100 (lambda () (smart-prisoners-dilemma 2))))