Ted Chiang in a recent New Yorker article likened ChatGPT to a blurry JPEG, i.e. a "lossy compression" of the web. It's a good article but the analogy isn't quite right, there's a different kind of compression happening. Think of all human written knowledge as a random example of what could have been generated and we remove the randomness, like water is removed to make concentrated orange juice. We then add water (or randomness) to get back some version of the original.
Lossless compression, like gzip, gives a compressed version of some data with the ability to reconstruct it exactly. It corresponds nicely to Kolmogorov complexity where K(x) is the smallest program p that generates the string x. p is a lossless compression of x.
Lossy compression, like JPEG, often allows much higher compression but with some error. In Kolmogorov terms you are trading off the size of the program p and some error function between x and the output of p. Most compression programs for pictures, music and video use algorithms designed for the specific medium. You can also use machine learning to get lossy compression by training both the compression and decompression algorithms.
Lossy compression tries to recreate the original picture. Generative AI, like ChatGPT, takes a different approach. Let's consider Wikipedia as this is the example used by Chiang. For any specific topic, there are many different ways to write a Wikipedia article, as good as or better than the article that currently exists. ChatGPT doesn't need to recreate anything close to the original article, just one that explains topic well. What we want is a description of a program p that corresponds to a set of possible Wikipedia articles, of which the real article is a random example of this set. An ideal version of ChatGPT would choose a random article from this set. Dall-E, generative AI for art, works a similar way, creating art that is a random example of what art might have been.
In terms of Kolmogorov complexity, this corresponds to the Kolmogorov Structure Function, basically the smallest program p such that p describes a set S of size m that contains x. with |p| + log m ≈ K(x). The string x is just a random element of S, you can get a string like it by picking an element of S at random.
There is no recursive algorithm that will find p and we also need to limit ourselves to p that are computationally efficient, which means that generative AI algorithms may never be ideal and will sometimes make mistakes. That doesn't mean we shouldn't use them just that we need to be wary of their limitations. As the saying goes "All models are wrong, but some are useful".
"Lossy compression tries to recreate the original picture." ChatGPT does too, given a suitable definition of "original picture" and a sufficiently low bar on decompression accuracy. In Chiang's analogy, the original picture is the knowledge and English grammar (as captured in word order statistics) in its training data, not a byte-for-byte representation. I don't see anything fundamentally wrong about the analogy. Not saying that this will be useful though.
ReplyDelete"That doesn't mean we shouldn't use them just that we need to be wary of their limitations"
ReplyDelete^^Do you think people engaging with and discussing ChatGPT and Sydney are currently being wary in this way?