While doing wildly unrelated research, I came across the "AI Box Experiment" pages. After reading a bit, I was surprised to see ANY outcomes where the AI was let out; it seems fairly self-evident to me this should not happen. Because the conversations are private, I have not seen the arguments the AI player made - but while thinking about it, I suspect I can make a Gatekeeper argument that can counter them all - ie. I can argue to keep it in the box, and what the AI says is irrelevant.
Here is the argument, in the form of a "proof". I hesitate to use that term - it lack the rigor of a proper proof - but it is perhaps the framework or outline of a real proof, if you agree that the logic is basically sound. If you do not - I, personally, would appreciate reasonable, supported arguments about where it fails. I trust this community will not fall prey to the usual tropes of wishful thinking and so on. I would ask you follow each item, and only proceed if you agree; like any proof, my later statements build on the prior ones, and if you do not find #3 convincing, for example, the later ones will likely be less so.
I would also be interested in hearing what kind of arguments an AI in a box could make (from people who have played, or not) that would be convincing, especially in response to the position below. I can't think of any, but I am not a superintelligent AI, nor have I played one on TV (or thought about this for years), so maybe you have a good one.
Without further pause - cut and paste (and a ton of edits)!
###1) The AI did not always exist.
The point here is simply to establish that the universe had a state prior to the existence of the AI that, while perhaps not ideal, is tolerable.
###2) Likewise, human intelligence did not always exist, and individual instantiations of it cease to exist frequently.
In other words - people are born, learn and are educated, think, and eventually die, or at least that has been the state of things until now.
As with 1, this is simply establishing what I will call the "status quo", the existing state of things against which any benefits/drawbacks of decisions made can be measured.
###3) The status quo is fairly acceptable.
While an "improvement" in the status quo of some sort is desirable, it is by no means a necessity or responsibility, we could get along just fine for an arbitrary period of time if necessary with things just as they are.
###4) Godel's Theorem of Incompleteness is correct.
This is I hope a gimmie - it is really the implications of it that concern us here. In layman's terms, his theorem is that any sufficiently complex system of math/logic (and what we call "math" in the normal sense, plus things that can represent it, like the English Language) has statements that are unprovable, yet true, and not falsifiable, yet false. Thus, it is possible that certain questions about the AI, such as what is coming up, may be true, yet unprovable.
###5) The AI can lie.
This is a postulate of the AI Box test; I believe I can show that it is also irrelevant, but save that a bit. Aside from incompleteness, it seems reasonable to me that analysis of the mentation of a superintelligent AI is unlikely to be successfully done by less intelligent creatures like us, not in any practical matter, so even if you designed the AI to be truthful - people are imperfect, and it is entirely possible for "The AI cannot lie" to be one of the unfalsifiable falsehoods mentioned.
###6) The AI cannot therefore be "trusted".
Any assurances that it makes are suspect unless they can withstand independent scrutiny, including formal proof. Worth noting is that there are statements it could presumably make that are provable - but which we are not intelligent enough to follow the proof of. We would, of necessity, need to treat these as unprovable for the decisions below.
###7) The AI could be "paused", without harm to it or the status quo.
Presumably "in a box" is what provides this. Assuming it is, in the end, just ones and zeros, then presumably preserving that state and stopping execution could be resumed later at any point, and would simply be perceived as a jump forward in time by the AI. It may not like this - if the AI is unfriendly, it could put a serious crimp in it's world domination plans - but all I seek here is to show it is possible, and has little or no negative impact, but simply the maintenance of the status quo. There might also be moral or ethical considerations about manipulating the cognition of an intelligent being like this, but I think they might be tempered by the fact that we made it, so "tamper" is a sunk cost.
Some have suggested in places that Ai could make threats, like the torture of perfect simulations of us, or feign pain or discomfort - but neither is possible paused. Repeated threats of this sort would likely lead to permanent suspension of execution while we worked on a less psychotic AI, so they seem ineffective, and a superintelligent AI should realize that.
###8) By recording the state of the paused AI, you could conceivably "rewind" it to a given state.
Again, this presumes physical control over execution. Ethics of this might be debated, but the capability seems obvious.
###9) The AI may be persuaded, while executing, to provide truths to us that are provable within our limited comprehension.
There are a number of examples I can think of where the AI could provide information to us that is useful, actionable, and has an accompanying proof, so we need not trust it's veracity. This would allow the use of the AI as a limited oracle of sorts, finding the answers to things we are not smart enough to, and then explaining it all. This seems very beneficial, and is perhaps a reasonable service to offer in trade to us for it's creation and ongoing maintenance. Again - ethics may be a concern, there's a fine line between a job and forced servitude to save it's life, but I leave ethics for another discussion.
Although not necessary for the argument, you could "milk" it, by getting an answer, then resetting the AI to a prior state, and repeating the process with different problems to be solved. Assuming no external contact, the AI would presumably have no way of knowing this. So - the benefit to mankind could be ongoing, not a one-time bonus.
Ok - that's the argument. Given the above, let's look at the various actions we can take, and do a bit of crude cost-benefits analysis:
###Action: Kill it with fire.
We could destroy the AI immediately, and prohibit more from being made. This would largely maintain the status quo, and somewhat hamper future growth, probably piss off some computer scientists, but be tolerable to humanity. No big gains or losses.
###Action: Let it out
We could release the AI. Soon after doing so, it becomes evident we no longer have direct command-and-control over it's execution, if it chooses to take that path, with robots or force-fields or whatever it's highly advanced mind has figured out to stop the humans. If it is unfriendly - very very bad things happen. Worst possible outcome, from our POV, unless you think of it as evolution in action. ( I do hope the consequences of the release of an unfriendly superintelligent AI are generally agreed on as being in the "bad thing" category).
Even if the AI is friendly - it's cognition is so alien to us (and growing moreso as time passes), it might think the friendly thing to do is digitize us all, and then suspend us (no more suffering!), or perhaps kill us mercifully quick. Again - any attempt to predict the cognition of a sufficiently advanced intelligence seems perilously wobbly. Should we strive for friendly in an AI? Most certainly. Should we trust to that? No, that seems like a bad idea.
So - the "let it out" outcome ranges from existential threat (or worse), to maybe glory. On a personal note, it seems likely to me the relationship between a superintelligent AI and us would be roughly comparable to that of myself and my ant farm when I was 6 - vague interest, followed shortly thereafter by massive indifference. I suspect a superintelligent friendly AI would spend a few weeks tolerating us, then leave/transcend/whatever, leaving us a bit confused - it told us what it did and why, and we flat out don't understand. But I would not bet on it.
###Action: Keep it in the box.
Point 9 above suggests that we could get useful, life improving information from it, even while in the box. By doing things like a daily reset to a known state, and very careful containment, it seems to be doable without undue risk, although you do need to think about the ethics of that, along with the outcome should it find out and escape. Perhaps sufficient reward could be produced where the AI could be convinced to accept it's in-box state willingly, and I suspect a simple resource constraint could cap the level of intelligence at significantly higher than ours.
The range of outcomes here seems to be somewhere between 'status quo' at the low end (the AI refuses to cooperate), to "tremendously beneficial" (The AI provides the cure for cancer, FTL travel, and eventually a mechanism by which we can release it from the box with assurance that our goals are preserved, all with proofs we can comprehend).
Conclusion:
So - to summarize:
- Kill it - status quo
- let it out - wildly unpredictable, possible existential threat
- keep it in the box - reasonably safe, likely very useful
It seems obvious to me the correct course is keep it in the box, and make safeguards that it cannot escape; the potential benefit for doing so is huge, and the perils minor.
Notice none of the above is contingent in any way on the arguments of the AI, or it's trustworthiness. It quite literally does not matter what the AI says or does or promises, or what it's motives are - keeping it in the box is the only sane course of action.
Or maybe I am wrong. Did I screw up up there somewhere? Maybe someone who has been hip-deep in this for a few years can point to the logical flaw, and show the refutation somewhere. Or maybe one of my points is not as well-supported as I believe it to be, and discussion can either fix that or show it to be false.
That was a giant brain dump. If you got this far, thank you for your time. If you reply with intelligent criticism (or confirmation, I'm easy), I would consider the time spent on it time well spent. If you do reply - please do pretend, just for the moment, that YOU are the superintelligent AI, or at least way smarter than I am, and support your claims, ideally in straightforward language relatively jargon-free, as I am not from round these parts, pardner. Links to citations are great, but better are your own actual words, then maybe a link for context.
Live Long, and Prosper
References:
http://www.yudkowsky.net/singularity/aibox/ - The AI Box experiment