The advances in artificial intelligence have changed the way we think about computing. For today's post, how nonuniformity plays a much larger role than I previously believed.
First, some background. Circuit complexity gives a different, more combinatorial, way to look at computation. Instead of viewing algorithms as step-by-step programs, circuit complexity captures computation as fixed hardware, gates wired to compute a function of their inputs.
Every polynomial-time computable language L can be computed by a family \(C_0,C_1,\ldots\) of polynomial-size circuits, where \(C_n\) has n inputs, the number of gates of \(C_n\) is bounded by \(n^c\) for some \(c\) and \(x=x_1\ldots x_n\) is in L iff \(C_n(x_1,\ldots,x_n)\) outputs 1.
The converse isn't true, there are languages L that have polynomial-size circuits that you can't compute in polynomial time, because each \(C_n\) might not have any relation to \(C_m\) for \(n\neq m\). We can get the equivalence by adding a uniformity condition, that there is some easily computable function \(f\) with \(f(1^n)=C_n\). Or we can keep the nonuniformity to define a larger class called P/poly.
In practice, nonuniformity rarely helps. We can hide randomness in the nonuniformity but we could likely also use pseudorandom generators to the same effect. Nonuniformity was more a technicality that we had to deal with instead of something that seemed to give circuits significantly more power.
Until this new age of AI. A machine learning model like a neural net goes through a lengthy training process to learn its weights, the training optimized offline. The learned weights become the nonuniform circuits, as technology improves and we create larger retrained circuits, the weights change as well.
We typically use the complexity class TC\(^0\), languages accepted by poly-size, constant-depth with threshold circuits as a rough theoretical model for neural nets. For nonuniform TC\(^0\) we cannot prove any significant upper bounds, we can't even show NEXP, nondeterministic exponential time, cannot be computed by nonuniform TC\(^0\) circuits. And while computing NEXP or even P in TC\(^0\) would be incredibly surprising, these advances in artificial intelligence tell us these nonuniform circuits are far more powerful than we would ever have expected.
Why constant depth? My understanding is that each new, larger frontier model uses more layers than the previous. The big labs stopped publishing their exact layer counts but from publicly available data, it seems layers increased roughly linearly with parameters. Though I'm not sure how parameters have scaled with context-window, which is the "input size n".
ReplyDeleteIt's admittedly hard to match asymptotics with real-world implementations. I get the impression that we're seeing a recent trend towards more shallow, wider neural nets, as they can be trained more efficiently in parallel. Shallow is still hundreds of layers.
DeleteThey are more powerful compared to our earlier, low expectations, but I think that so far they can only solve the same things as humans, so I would rather say that what we experience can be described as that if there is a TC0 circuit for some problem, then there is an algorithm to find one, or a good approximation of it. Is there some theory studying such problems?
ReplyDelete