(Thinking-out-loud comments on a not-yet-published draft.)
three choices: eat candy, eat chocolate, or hug a friend
(Some authors would consider chocolate a kind of candy?)
by generating triples in [0,1] here
If you're linking to an online service, probably better to link something that can run code snippets rather than just a random-number generator? I wrote the appropriate trivial Python program so that you can say something like, "You can test this out for yourself by generating triples in [0,1] and averaging the maximums of the triples."
The power calculation is the average attainable utility. This calculation breaks down into the sum of the average attainable utility when candy is best, the average attainable utility when chocolate is best, and the average attainable utility when hugs are best.
Really? Doesn't it have to be average attainable utility when candy is best multiplied by the probability that candy is best, &c.?
The most likely action at
Start
is to go toWait!
; this action is instrumentally convergent (why?)
It's instrumentally useful for chocolate-seekers and hug-seekers, and chocolate-seekers and hug-seekers together make up 2/3 of possible agents.
However, candy is the most likely single possibility, but this advantage disappears in the limit of farsightedness. That's because waiting can be worse than candy, even if we like chocolate or hugs more than candy.
This is confusingly worded. Does it mean the same thing as, "Even if we like chocolate or hugs more than candy, time-discounting can make us prefer candy now rather than chocolate or hugs later"?
Consider Tic-Tac-Toe [...] Remember, although we keep the rules of Tic-Tac-Toe intact, we're considering the uniform distribution over reward functions.
"Tic-tac-toe except with uniform distribution over reward functions" is not tic-tac-toe, so I would rephrase to put the specification of reward up front rather than a "Remember, ..." sentence at the end of the paragraph.
as defined in the paper [...] I refer the reader to the paper [...] last theorem in the paper
What paper? Where is the paper?! Not published yet? (I can find "Conservative Agency" on arXiv, but that doesn't seem to match the Figure and Section numbers/titles you give.)
Reading this once was enough to pique my curiosity, but I still don't feel like I understand it. (Probably need to read the actual paper and work out my own examples with pencil/Python.) The "random distribution of preferences over tic-tac-toe end states implies that you probably don't want to end the game yet" example was helpful, but I don't have the general insight yet. Trying to articulate what I'm stuck on, I look back to that equation, and find that I wish it had more words—
power contribution = % of goals ⋅ average control.
Power contribution of what? Percentage of goals that what? Average control over what?