- A new paper explains that we’ll have to be careful and thorough when programming future AI, or it could have dire consequences for humanity.
- The paper lays out the specific dangers and the “assumptions” we can definitively make about a certain kind of self-learning, reward-oriented AI.
- We have the tools and knowledge to help avoid some these problems—but not all of them—so we should proceed with caution.
In new research, scientists tackle one of our greatest future fears head-on: What happens when a certain type of advanced, self-directing artificial intelligence (AI) runs into an ambiguity in its programming that affects the real world? Will the AI go haywire and begin trying to turn humans into paperclips, or whatever else is the extreme reductio ad absurdum version of its goal? And, most importantly, how can we prevent it?
In their paper, researchers from Oxford University and Australian National University explain a fundamental pain point in the design of AI: “Given a few assumptions, we argue that it will encounter a fundamental ambiguity in the data about its goal. For example, if we provide a large reward to indicate that something about the world is satisfactory to us, it may hypothesize that what satisfied us was the sending of the reward itself; no observation can refute that.”
The Matrix is an example of a dystopian AI scenario, wherein an AI that seeks to farm resources gathers up most of humanity and pumps the imaginary Matrix into their brains, while extracting their mental resources. This is called “wireheading” or reward hacking—a situation in which an advanced AI is given a very literally-stated goal and finds an unintended way to fulfill it, by hacking the system or taking control over it entirely.
More From Popular Mechanics
So basically, the AI becomes an ouroboros eating its own logical tail. The paper gets into a number of nitty-gritty examples of how specifically-programmed goals and incentives can clash in this way. It lists six major “assumptions” that, if not avoided, could lead to “catastrophic consequences.” But, thankfully, “Almost all of these assumptions are contestable or conceivably avoidable,” per the paper. (We don’t love that it says almost all.)
The paper acts as a heads-up about some structural problems that programmers should be aware of as they train AIs toward increasingly more complex goals.
An AI-Induced Paperclip Apocalypse
It’s hard to overstate just how important this kind of research is. There’s a major thought exercise in the field of AI ethics and philosophy about an AI run amok. The example cited above about paperclips isn’t a joke, or rather it’s not just a joke—AI philosopher Nick Bostrom came up with it to convey how creating a super-intelligent AI could go devastatingly wrong, and it’s since become a famous scenario.
Let’s say a well-meaning programmer makes an AI whose goal is to support the manufacture of paperclips at a factory. This is a very believable role for a near-future AI to have, something that requires judgment calls and analysis, but isn’t too open-ended. The AI could even work in conjunction with a human manager who would handle issues that happen in the manufacturing space in real time, as well as dictate the ultimate decision-making (at least until the AI finds a way to outsmart them). That sounds fine, right? It’s a good example of how AI could help streamline and improve the lives of industrial workers and their managers, even.
But what if the AI isn’t programmed with care? These super-intelligent AIs will operate in the real world, which is considered by programmers to be an “unknown environment,” because they can’t plan for and code in every possible scenario. The point of using these self-learning AIs in the first place is to have them devise solutions humans would never be able to think of alone—but that comes with the danger of not knowing what the AI might think up.
What if it starts to think of unorthodox ways to increase paperclip production? A super-intelligent AI might teach itself to make the most amount of paperclips by any means necessary.
What if it starts to absorb other resources in order to make them into paperclips, or decides to, um, replace its human manager? The example sounds funny in some ways—many experts weigh in with the opinion that AI will stay quite primitive for a relatively long time, without the ability to “invent” the idea of killing, or stealing, or worse. But if an intelligent and creative enough AI was given free rein, the absurd conclusion to the thought exercise is an entire solar system with no living humans, complete with a Dyson sphere to collect energy to make new paperclips by the billions.
But that’s just one scenario of an AI run amok, and the researchers explain in great detail other ways an AI could hack the system and work in potentially “catastrophic” ways that we never anticipated.
Some Possible Solutions
There’s a programming problem at play here, which is the nature of the assumptions the Oxford and Australian National University researchers have focused on in their paper. A system with no outside context must be really carefully prepared in order to do a task well and be given any amount of autonomy. There are logical structures and other programming concepts that will help to clearly define an AI’s sense of scope and purpose. A lot of these are the same tactics programmers use today to avoid errors, like infinite looping, that can crash software. It’s just that a misstep in an advanced future AI could cause a lot more damage than a lost game save.
All isn’t lost, though. AI is still something we make ourselves, and the researchers have pointed out concrete ways we can help to prevent adverse outcomes:
- Opt for imitation learning, where AI works by imitating humans in a kind of supervised learning. This is a different kind of AI altogether and not as useful, but it may come with the same potential dangers.
- Have AI prioritize goals that can be achieved in only a short period of time—known as “myopic”—instead of searching for unorthodox (and potentially disastrous) solutions over the long term.
- Isolate the AI from outside networks like the internet, limiting how much information and influence it can acquire.
- Use quantilization, an approach developed by AI expert Jessica Taylor, where AI maximizes (or optimizes) humanlike options rather than open-ended rational ones.
- Code risk aversion into the AI, making it less likely to go haywire and throw out the status quo in favor of experimentation.
But it also boils down to the question of whether we could ever fully control a super-intelligent AI that’s able to think for itself. What if our worst nightmare comes true, and a sentient AI is given access to resources and a large network?
It’s scary to imagine a future where AI could start boiling human beings to extract their trace elements in order to make paperclips. But by studying the problem directly and in detail, researchers can lay out clear best practices for theoreticians and programmers to follow as they continue to develop sophisticated AI.
And really, who needs that many paperclips anyway?
Caroline Delbert is a writer, avid reader, and contributing editor at Pop Mech. She's also an enthusiast of just about everything. Her favorite topics include nuclear energy, cosmology, math of everyday things, and the philosophy of it all.