Skip to content

Instantly share code, notes, and snippets.

@zackmdavis
Created November 30, 2019 23:42
Show Gist options
  • Save zackmdavis/5c7c1bdca53dca829d4ca4077f83bb74 to your computer and use it in GitHub Desktop.
Save zackmdavis/5c7c1bdca53dca829d4ca4077f83bb74 to your computer and use it in GitHub Desktop.
comments on "Seeking Power is Provably Instrumentally Convergent in MDPs"

(Thinking-out-loud comments on a not-yet-published draft.)

three choices: eat candy, eat chocolate, or hug a friend

(Some authors would consider chocolate a kind of candy?)

by generating triples in [0,1] here

"Here" links are terrible!

If you're linking to an online service, probably better to link something that can run code snippets rather than just a random-number generator? I wrote the appropriate trivial Python program so that you can say something like, "You can test this out for yourself by generating triples in [0,1] and averaging the maximums of the triples."

The power calculation is the average attainable utility. This calculation breaks down into the sum of the average attainable utility when candy is best, the average attainable utility when chocolate is best, and the average attainable utility when hugs are best.

Really? Doesn't it have to be average attainable utility when candy is best multiplied by the probability that candy is best, &c.?

The most likely action at Start is to go to Wait!; this action is instrumentally convergent (why?)

It's instrumentally useful for chocolate-seekers and hug-seekers, and chocolate-seekers and hug-seekers together make up 2/3 of possible agents.

However, candy is the most likely single possibility, but this advantage disappears in the limit of farsightedness. That's because waiting can be worse than candy, even if we like chocolate or hugs more than candy.

This is confusingly worded. Does it mean the same thing as, "Even if we like chocolate or hugs more than candy, time-discounting can make us prefer candy now rather than chocolate or hugs later"?

Consider Tic-Tac-Toe [...] Remember, although we keep the rules of Tic-Tac-Toe intact, we're considering the uniform distribution over reward functions.

"Tic-tac-toe except with uniform distribution over reward functions" is not tic-tac-toe, so I would rephrase to put the specification of reward up front rather than a "Remember, ..." sentence at the end of the paragraph.

as defined in the paper [...] I refer the reader to the paper [...] last theorem in the paper

What paper? Where is the paper?! Not published yet? (I can find "Conservative Agency" on arXiv, but that doesn't seem to match the Figure and Section numbers/titles you give.)

Reading this once was enough to pique my curiosity, but I still don't feel like I understand it. (Probably need to read the actual paper and work out my own examples with pencil/Python.) The "random distribution of preferences over tic-tac-toe end states implies that you probably don't want to end the game yet" example was helpful, but I don't have the general insight yet. Trying to articulate what I'm stuck on, I look back to that equation, and find that I wish it had more words—

power contribution = % of goals ⋅ average control.

Power contribution of what? Percentage of goals that what? Average control over what?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment