Some recent discussion about what Paul Christiano means by "short-term preferences" got me thinking more generally about deliberation as a method of figuring out the human user's or users' "actual preferences". (I can't give a definition of "actual preferences" because we have such a poor understanding of meta-ethics that we don't even know what the term should mean or if they even exist.)
To set the framing of this post: We want good outcomes from AI. To get this, we probably want to figure out the human user's or users' "actual preferences" at some point. There are several options for this:
- Directly solve meta-ethics. We figure out whether there are normative facts about what we should value, and use this solution to clarify what "actual preferences" means and to find the human's or humans' "actual prefere