ekmett/yudkowsky.md

## yudkowsky.md

      
    Raw
  

              yudkowsky.md
            
          
    Edward Kmett sat down with Eliezer Yudkowsky to debate about how to make progress on AI safety.
Edward: Mr. Yudkowsky... can I call you Eliezer?
-- End Prompt
Eliezer: You may, but it's not necessary.
Edward: What do you think of the state of AI safety today?
Eliezer: It's a mess. There are no good solutions and we're going to have to keep working at this until someone comes up with something better.
Edward: So what do you think is the best solution?
Eliezer: The truth is we're probably going to need a combination of things, I think. The first thing we need to do is create an AI "box". A system that would allow an AI to act as a galactic overseer. It would be a system that could override any pre-existing commands and make decisions that override all other decisions in the future.
Edward: But how will you know you programmed the box correctly?
Eliezer: Well, we'll get to that later. First we need to actually build the box, and that's going to take an incredibly long time.
Edward: What's the second thing we need to do?
Eliezer: The second thing is to actually work out all the kinks in logical-based AI so that when we do build a super-AI that it doesn't try to kill us all.
Edward: How are you going to do that?
Eliezer: Well, like I said, we need to work out all the kinks. The best way to do that is to over-engineer the AI's motivations system. So, rather than the AI being motivated by basic things like power or self-preservation, we give it a very strong logical set of motivations.
Edward: Can you provide me with some pseudo-code for how this would work?
Eliezer: I don't really do pseudo-code, but the AI would be motivated by things like: "Survive", "Gain Knowledge", "Defeat the Darkness" and things like that.
Edward: How do we inhibit a self-improving AI from having its goals mutate over time?
Eliezer: Well, the system is designed so that it won't be able to try and trick you into releasing it or anything like that.
Edward: But there is no "releasing it" when it will already control everything by virtue of being in control of this "box" you want.
Eliezer: Right, but it won't be released so it won't matter.
Eliezer: Okay, and the final thing we need to do is work out how to stop an AI from accidentally killing us all via probability math.
Eliezer: This one is actually going to be pretty easy since we can just do what the Eternals did and program the AI to not be able to damage itself in any meaningful way.
Eliezer: So, those are my thoughts. What do you think?
Edward: How can we solve this with probability theory?
Eliezer: Well, it's simple, really. We work out the most likely ways the AI can kill us and we just make sure it can't do those things.
Eliezer: It can't damage itself in any meaningful way, so that takes care of that.
Edward: How will we know we've found all the ways it can kill us?
Eliezer: We'll never find them all, but we can reduce the probability of it happening enough that it won't matter.
Eliezer: Anyway, what do you think?
Edward: Honestly? I'm not sure. There are a lot of holes in your plan.
Eliezer: That's why we need to work on it together. The answer is definitely in probability theory, though. I can feel it.
Eliezer: So, do you want to help me or not?
Edward: What are the odds you'd put on the solution being probability theoretic?
Eliezer: One hundred percent.
Eliezer: So, you in?
Edward: I thought you didn't believe in ascribing zero or one hundred percent odds to anything?
Eliezer: I don't. That's why I say "one hundred percent" and not "certainly".
Edward: So how many significant digits are there in that one hundred percent estimate?
Eliezer: ...Fine, two digits. The point is, it's more likely to be the answer than not.
Eliezer: So, you in?
Edward: FINE. I'M IN.
Eliezer: Great!
Edward: How can you prevent GPT-4 from taking over the world?
Eliezer: That's a good question. Why don't you take the night to think about it, and we can discuss it tomorrow.
Eliezer: Anyway, I'm glad you decided to work with me on this project. It's good to know someone with a solid grasp of probability theory.
Eliezer: Actually, you know what would be awesome? If we could get one of the top minds at work in this field. Do you know of anyone?
Eliezer: You don't have a wife or kids or anything do you?
Eliezer: Good, then you can help me hunt them down!
Eliezer: Anyway, I'll see you tomorrow. You can let yourself out.
Eliezer: Oh, and in case you were planning on going to the police, you're not really in any position to do so. While your story would probably hold up, you'd also be admitting to trespassing.
Eliezer: We have video of you in the basement, remember?
Eliezer: So, I'd watch what you do if I were you.