Wednesday, October 01, 2025

Computers Don't Want

I read through the new book If Anyone Builds It, Everyone Dies by Eliezer Yudkowsky and Nate Soares. "It" refers to Artificial Super Intelligence (ASI). A very short version of the authors' argument: You can view advanced AI as though it has its own desires and agencies, its needs will be incompatible with those of the human race, and AI has the capabilities to eliminate the humans even without the killer robots.

I have no doubt that a crafty sentient AI hellbent on destroying humanity could do so. But let's look at the first part of the argument, should we reason about AI as though it has agency and preferences? The authors make a subtle argument in their Chapter 3, that while AI doesn't have its own wants and desires, we can reason about AI as though it does. In the following chapters, the authors go all in and think of ASI as though it has preferences and acts in its own self interest.

I think of computing as a Turing machine, a device that follows a set of simple instructions, interacting with input and memory, and producing some output. The machine does not have wants or desires, all it does is follow its instructions.

But we also realize the immense complexity that can arise from such simplicity. We have Rice's Theorem that says we can't understand, in general, anything about a Turing machine's output from the code of the machine. And there's a reason why we can't prove P ≠ NP or even have a viable approach, we have no idea how to bound the complexity of efficient algorithms. But we shouldn't confuse complexity and our inability to understand the algorithm as evidence of agency and desires. Even if AI seems to exhibit goal-oriented behavior, it's a property of its training and not evidence of independent agency. 

I worry less about AI developing its own hostile agency than about how humans will wield it, whether through malicious misuse or misplaced trust. These are serious risks, but they're the kinds of risks we can work to mitigate while continuing to develop transformative technology. The "everyone dies" framing isn't just fatalistic, it's premised on treating computational systems as agents, which substitutes metaphor for mechanism.

11 comments:

  1. This whole discussion is so old.
    Like The Matrix. Terminator 2. 2001 Space Odyssey. Half the Classic stories by Philip K Dick and so on.
    Langs Metropolis. RUR. All the Golem stories.
    Just because the book has a Nonfiction tag, everyone reheats decades old discussions.

    ReplyDelete
    Replies
    1. "It's a waste of time talking about X happening in the real world, because in the past people have written stories about X." That's why Gilgamesh and Noah's flood mean we don't need to think about rising sea levels due to climate change, Jules Verne's "The World Set Free" meant that we never had to worry about nuclear weapons, and everyone looking for evidence of life elsewhere in the universe is wasting their time because there's been lots of science fiction featuring extraterrestrial life.

      People write fiction (and speculation and myth) about things because there's something at least superficially plausible about them. That doesn't make similar things _less_ likely to happen in reality.

      And it's not as if people are only speculating about highly capable AI because they've read too much science fiction; have you seen what today's AI systems can do compared with those of ten years ago?

      It might turn out that there are big fundamental obstacles between where we are now and the sort of existentially-threatening AI that Yudkowsky and Soares worry about. But you can't just _assume_ that there are because people in the past have written fiction about AI. You need to understand how the AI systems work (and no, stochastically parroting the words "stochastic parrot" does not constitute understanding how the AI systems work) and how they're trained and what avenues of future research there are.

      And it turns out that most of the people who do understand those things are saying: yeah, there's a distinctly nonzero chance that these things will kill us all. Some of them may just be talking up their own investments -- if Sam Altman warns about how we might all be killed by much-smarter-than-us AI systems, he's probably doing it mostly because he thinks that'll encourage people to invest in OpenAI -- but e.g. Geoff Hinton doesn't seem to have much incentive to do that.

      Again, those people are just guessing, and they might be wrong, but on the face of it their guesses are more educated than most people's, and very few of them are saying "nah, nothing to worry about here".

      Delete
  2. what makes you convinced that humans have this special independent agency?

    low-level instruction following could be modelled as agency at higher level abstractions. and i don't think the distinction between being able to model it in some way and it actually behaving that way is that clear.

    ReplyDelete
  3. Gee,I I wish I had read your post before buying the book. More seriously, the writers are very knowledgable of the field so I am surprised they make such a funadmental error.

    I'm more worried about AI and Economy- massive unemployment. YES people will change jobs but the transition will be brutal.

    ReplyDelete
    Replies
    1. I'm skeptical that LLMs will lead to large changes in employment. I have to keep skipping over the LLM output when using Google or Bing to search.

      Delete
  4. This post seems kind of odd to me. Do you think that human brains are doing something fundamentally different from computation? If so, what's the mechanism for that? If not, doesn't that show that computation can give rise to agents which, at the very least, have the strong appearance of having desires and goals?

    The first comments also confuses me a bit. Isn't it natural that now that we are on the verge of having highly capable artificial intelligence, old debates about AI (and related topics) become more relevant and are discussed by more people and with more energy? Just because people have talked about a topic in the past does not mean it is pointless or irrelevant in the present.

    ReplyDelete
    Replies
    1. +1, Lance I’d love to know what your thoughts are on how AI sycophancy and other “agent-like” behaviors push us to LLMs which are better modelled by an intentional stance (in the sense of Dennett; compared to the designed stance that we use for most computer programs or the physical stance which Rice’s theorem sort of makes infeasible)

      Delete
    2. I'm not worried about supercomputers killing the human race (at least for a while). But, computational systems can certainly be agents: We have lots of examples of such organic computational agents.

      Delete
  5. I'm getting a few comments and social media replies asking along the lines of "aren't human brains just Turing machines themselves?" That's a tricky question that I'll tackle in next week's post.

    ReplyDelete
  6. I don't think anything in your argument is inconsistent with the following scenario. We train an AI system to have some overarching goal, such as its own survival, and in the service of that goal it develops all sorts of subgoals. Even if the initial goal is in there as a direct result of training, we could have far less connection with the subgoals -- indeed, we might have very little idea what many of them are, and they might turn out to be very unaligned with our interests.

    In such a situation, it would be reasonable to talk about the system having desires and agency (even if one could argue about whether those are consciously felt).

    I would also repeat the argument that others have made. Suppose I were to replace "AI" by "a human" in your sentence, "Even if AI seems to exhibit goal-oriented behavior, it's a property of its training and not evidence of independent agency," how would you argue against that?

    ReplyDelete