In short: we respond to a recent article arguing that ensuring that “superintelligent” AI systems are safe is an impossible task. We clarify that their argument does not preclude practical work on the foundations of safety in future AI systems.
The paper Superintelligence Cannot be Contained: Lessons from Computability Theory by Alfonseca et al argues that ensuring that a sufficiently complex Artificial Intelligence does not perform an unsafe action is mathematically impossible.
Early versions of the paper date back to at least 2016, but following its recent publication in the Journal of Artificial Intelligence Research it has received a lot of attention in the press.
We have seen that many of the popular summarizations portray the paper as refuting the possibility of designing complex AI systems without unintended consequences. If this were true, it would have huge implications for the well established field of formal verification and the budding field of Artificial Intelligence Safety - we would be compelled to put away resources from these fields given the intractability of the foundational problems they aim to solve.
However a close examination of the argument reveals that this is not the case. In this piece we will outline their core argument, and clarify its implications.
The core of the argument in Alfonseca et al’s paper is not original, and goes back to Henry Gordon Rice in 1953.
One well known result in computability theory is the impossibility of having a general procedure that would, given any program, decide whether the program will eventually stop or whether it would run forever (this is known as “The Halting Problem” in the literature).
Rice showed that an immediate corollary is that there are no procedures that, in general, can identify whether a given program has any particular property, for example whether the program is computing whether its input is a prime number.
Alfonseca et al point out that Rice’s proof also applies to any property about safety. If one had a procedure which, when given the code for any program, could decide whether an unsafe function is eventually executed, then one could use this to decide whether any program stops. One would just have to write a program like the one in table 1 below: this program will execute the unsafe_function() if and only if an arbitrary ‘program2` halts.
program2(input) # program2 is the program we want to know if it halts
unsafe_function() # the unsafe function is executed if and only if program2 halts
Table 1: A program we could use to decide whether a second program halts, if we could decide whether any given program will ever execute a known unsafe function
Alfonseca et al conclude from this that AI safety is impossible when it comes to superintelligent systems; no matter how one defines safety, or how exhaustive one thinks one’s review process is, there will be programs which will fool it. And a “superintelligence”, they argue, ought to be general enough to arbitrarily execute any program. So ensuring the safety of such a general system would require ensuring the safety of arbitrary programs, which we can show it is impossible. Hence, they argue, it is not possible to show that a superintelligent AI system will be safe.
It seems like an argument similar to Alfonseca et al’s could show that all programs are unpredictable. But if I write a program that simply prints on screen “I am not dangerous” and then finishes, then this program is quite predictable. In fact, engineers every day manage to write sophisticated programs, and reason consistently about the consequences of deploying the software. We have also seen many successes in formal verification research; for example guaranteeing that an operative system is correctly implemented.
Where does this application of the argument break down then? The key is that Rice’s theorem prohibits a decision procedure for deciding any property of an arbitrary program. But in application we are not interested in arbitrary programs; we are interested in studying particular programs.
By way of analogy: there is no procedure that will, given any mathematical statement, produce a proof or prove the opposite. But this does not mean that a proof of Pythagoras' theorem is impossible!
To write a program whose outcome is unpredictable in principle takes deliberate effort. In practice, engineers and researchers in AI almost never need to bother thinking about for example whether their program will ever halt.
In short: we will never have a procedure that takes any chunk of code and accurately predicts the consequences of running the code. But we can, and routinely do, predict some consequences of running a specific piece of code.
Furthermore, one can write one’s code in such a way that it is more transparent and amenable to formal (and informal) verification; this is the approach taken by many experts in formal verification.
Alfonseca et al are not ignorant of this pitfall. Instead they add the assumption that what they call a superintelligent machine might be able to execute arbitrary programs as subroutines, in an unpredictable manner. Hence they reasonably suggest that a complete solution to safety would require to solve the impossible Halting problem.
But the AI itself could be in principle engineered to not execute arbitrary subroutines!
Building complex AI, even AI that surpasses human capabilities in all tasks humans routinely face, does not require a system which executes arbitrary subroutines in principle. The problems we face and are interested in are not arbitrary. It is not even clear to what degree could we call a system systematically executing arbitrary programs “superintelligent”; surely the routines it will execute will be ones that are likely to end and will help the system solve the problems it faces.
Another analogy: as pointed out in the paper, “a man provided with paper, pencil, and rubber, and subject to strict discipline, is in effect a universal machine”. That is, a person could in principle execute any arbitrary program. But we hardly could call humans “unpredictable in principle”; the programs that they will simulate follow some logic. We did not arbitrarily choose to write this article; we chose to write for a reason. And sure, humans are not designed to be predictable, so it is hard to anticipate what they will do. But computer programs can be designed to be more transparent and predictable.
This is why research in formal verification is not pointless, and why Alfonseca et al’s paper does not preclude meaningful work being done on practical AI safety challenges that may lay the foundations for safety in future systems, superintelligent or not.
This does not mean that AI systems will have easily predictable consequences. Our argument refutes the computational impossibility of predicting the consequences of running a program. But this does not mean that they will be predictable in a practical sense. The programs can be quite complex, and have unintended consequences; this is in fact a great potential source of risk.
Alfonseca et al’s paper aims to answer an important question, but their conclusions are not clearly explained: while it is impossible to design a procedure that would check any arbitrary program for safety, we can still reason about the safety properties of individual computer programs and design AI systems to behave predictably.
We do appreciate academic engagement with Artificial Intelligence Safety. There are compelling reasons to believe that both 1) future advances in Artificial Intelligence may radically transform society and 2) we know too little about the potential safety issues related to artificial intelligence. This makes research in the area potentially very important.
However, productive discussion on the topic (especially when talking about controversial topics like “superintelligence”) requires open criticism and clear communication of the implications of technical results. This is why we felt important to write this piece: we wish to make clearer the implications of their argument to a more general audience.
For lay people looking to learn more about Artificial Intelligence Safety, we recommend this talk by our colleague Shahar Avin. For researchers looking to engage with recent research we suggest to look over here, here and here for compilations of papers.
We thank Jose Orallo, Shahar Avin and Sonja Amadae for their insightful discussion. Sean O hEigeartaigh improved this post tenfold with his careful editing.