Wednesday, October 01, 2025

Computers Don't Want

I read through the new book If Anyone Builds It, Everyone Dies by Eliezer Yudkowsky and Nate Soares. "It" refers to Artificial Super Intelligence (ASI). A very short version of the authors' argument: You can view advanced AI as though it has its own desires and agencies, its needs will be incompatible with those of the human race, and AI has the capabilities to eliminate the humans even without the killer robots.

I have no doubt that a crafty sentient AI hellbent on destroying humanity could do so. But let's look at the first part of the argument, should we reason about AI as though it has agency and preferences? The authors make a subtle argument in their Chapter 3, that while AI doesn't have its own wants and desires, we can reason about AI as though it does. In the following chapters, the authors go all in and think of ASI as though it has preferences and acts in its own self interest.

I think of computing as a Turing machine, a device that follows a set of simple instructions, interacting with input and memory, and producing some output. The machine does not have wants or desires, all it does is follow its instructions.

But we also realize the immense complexity that can arise from such simplicity. We have Rice's Theorem that says we can't understand, in general, anything about a Turing machine's output from the code of the machine. And there's a reason why we can't prove P ≠ NP or even have a viable approach, we have no idea how to bound the complexity of efficient algorithms. But we shouldn't confuse complexity and our inability to understand the algorithm as evidence of agency and desires. Even if AI seems to exhibit goal-oriented behavior, it's a property of its training and not evidence of independent agency. 

I worry less about AI developing its own hostile agency than about how humans will wield it, whether through malicious misuse or misplaced trust. These are serious risks, but they're the kinds of risks we can work to mitigate while continuing to develop transformative technology. The "everyone dies" framing isn't just fatalistic, it's premised on treating computational systems as agents, which substitutes metaphor for mechanism.

6 comments:

  1. This whole discussion is so old.
    Like The Matrix. Terminator 2. 2001 Space Odyssey. Half the Classic stories by Philip K Dick and so on.
    Langs Metropolis. RUR. All the Golem stories.
    Just because the book has a Nonfiction tag, everyone reheats decades old discussions.

    ReplyDelete
  2. what makes you convinced that humans have this special independent agency?

    low-level instruction following could be modelled as agency at higher level abstractions. and i don't think the distinction between being able to model it in some way and it actually behaving that way is that clear.

    ReplyDelete
  3. Gee,I I wish I had read your post before buying the book. More seriously, the writers are very knowledgable of the field so I am surprised they make such a funadmental error.

    I'm more worried about AI and Economy- massive unemployment. YES people will change jobs but the transition will be brutal.

    ReplyDelete
  4. This post seems kind of odd to me. Do you think that human brains are doing something fundamentally different from computation? If so, what's the mechanism for that? If not, doesn't that show that computation can give rise to agents which, at the very least, have the strong appearance of having desires and goals?

    The first comments also confuses me a bit. Isn't it natural that now that we are on the verge of having highly capable artificial intelligence, old debates about AI (and related topics) become more relevant and are discussed by more people and with more energy? Just because people have talked about a topic in the past does not mean it is pointless or irrelevant in the present.

    ReplyDelete
    Replies
    1. +1, Lance I’d love to know what your thoughts are on how AI sycophancy and other “agent-like” behaviors push us to LLMs which are better modelled by an intentional stance (in the sense of Dennett; compared to the designed stance that we use for most computer programs or the physical stance which Rice’s theorem sort of makes infeasible)

      Delete