Wednesday, April 29, 2026

Because It Doesn't Have To

My favorite quote about networking came from Jim Kurose.

The Internet works so well because it doesn't have to.

The IP and lower layers of the internet stack make no promises of delivery. Complete failure fulfills the protocol. This allows for simpler and more powerful protocols without the extra complexity needed to guarantee success. TCP aims for delivery basically by restarting the IP communication when it fails, and even TCP can report failure to the layers above.

We can say the same about modern artificial intelligence.

Machine learning works so well because it doesn't have to.

With the softmax function that neural nets use to determine the probability of outputs, neural nets never completely rule out a possibility, always giving it at least some tiny probability. In cases where the complexity is just too difficult, neural nets give several possibilities with nontrivial probabilities, as I described in my recent post, where a machine learning model would generate a uniform distribution to capture the output of a pseudorandom generator. Instead of rigidly forcing the model to give us a specific answer, by looking at distributions we allow the models to make mistakes.

Thus a machine learning model can be correct when it makes probabilistic guesses in situations too complicated to solve directly, which allows it to achieve its best possible performance. Because we allow the models to make mistakes, they have the flexibility to solve complex problems far more frequently.

4 comments:

  1. What's your opinion on XAI? eXplainable AI.

    ReplyDelete
    Replies
    1. For most cases I don't think having explainability is worth the trade offs in capability. That'll be a good topic for a future post.

      Delete

  2. the current state of AI is like Internet of early days, not secure reliable systems we use today. a massive amount of work and investment that is done to make Internet systems reliable and secure over the past 3 decades. the price point also has to come down significantly.

    some of these might be addressed over the coming decade, but it is not there yet.

    and a lot of investors will lose money as they did in the Internet bubble, even big names like Cisco.

    ReplyDelete
  3. building a probabilistic tree of possible outputs is very expensive, verifying them is also expensive, and you need to collapse them eventually to a few fixed outcomes. users don't get the probabilities as the output, they get a random/heuristic sample from the distribution.

    one of the metrics that is important but is not being published is the number of failed paths that were explored when trying to solve math problems. intuitively a more intelligent system should explore fewer failed paths.

    a good undergrad student would find the correct solution right away or with very few ideas. a less good student would throw every tool they have and do a lot of exploration of possibilities. a bad student might not be very different from a monkey behind a typewriter typing random things.

    the metric for the number of false paths explored is a very important metric for intelligence, and it also ties into the amount of compute one needs to solve a problem. right now they are using the full power of massive datacenters at the inference time to solve math questions that a good student can solve at IMO.

    so there is a lot of progress and a very interesting future, but we are still at the early innings of this AI story.

    ReplyDelete