AI will not be safe by default

I work on AI safety, by which I mean the problem of making sure that out-of-control AI doesn’t cause a disaster. In particular, I am concerned that in the next few decades we may build artificial general intelligence (AGI): an AI system that is more capable than humans in a non-specialized manner, including scientific research, engineering, political / business / military strategy, and social persuasion.

I want to discuss a common (and reasonable) pushback to AI safety concerns from engineers or other technically-minded people, especially with respect to discussion around slowing down AI progress.

The common pushback:

Every technology and every field has safety issues. And yes, there are disasters and safety failures―sometimes a car crashes, or a bridge fails―but those are not reasons to stop building cars or bridges. In fact they’re a reason to train more civil engineers and mechanical engineers, and to make more progress overall such that we can build safer cars and safer bridges. And the same is true for AI: we need more progress, not less.

When I first encountered news articles on AI risk, I had the same reaction. I’ve changed my mind since then. My response to my past self is: AI safety, unfortunately, is a different kind of problem than those faced by other engineering disciplines. I realize this is an annoying thing to claim, but it’s true.

This post is too short to lay out all the difficulties, but I will mention one very important difference: in engineering we are usually able to work via iterative trial-and-error: build a prototype, test it for weaknesses, build a version 2, and so forth. Trial and error is how we make progress in all fields of science and engineering, including AI. In AGI safety, making systems safe by trial and error will probably be hard or almost impossible. If we get AGI wrong, e.g. we build a sufficiently capable AGI that does not share all our goals, then we may lose control, and the AGI may prevent us from correcting our mistakes.

The proper analogy is that AGI is similar to the task of developing modern airplane technology without ever once crashing a plane. This is not something we know how to do. Progress is usually made by trial-and-error, by crashing prototypes here and there, and gradually figuring things out. By default, we won’t be able to do the same with AI. We are all seated on the first plane to ever fly, and if it crashes once we don’t get another chance.

In my opinion, the situation is not hopeless: there are things we can do to substantially improve our odds of getting AGI safety right. But we need a different approach.

Lauro Langosco — Writings

AI will not be safe by default

June 2023