Computational Complexity: A Complexity View of Machine Learning?

Wednesday, November 03, 2021

A Complexity View of Machine Learning?

Complexity is at its best when it models new technologies so we can study it in a principled way. Quantum computing comes to mind as a good relatively recent example. With machine learning playing an every growing role in computing, how can complexity play a role?

The theory community questions about machine learning typically look at finding mathematical reasons to explain why the models well with little overfitting or trying to get good definitions of privacy, fairness, explainability to mitigate the social challenges of ML. But what about from a computational complexity point of view? I don't have a great answer yet but here are some thoughts.

In much of structural complexity, we use relativization to understand the relative power of complexity classes. We define an oracle as a set A where a machine can ask questions about membership to A and magically get an answer. Relativization can be used to help us define classes like Σ₂^P = NP^NP or allow us to succinctly state Toda's theorem as PH in P^#P.

As I tweeted last week, machine learning feels like an oracle, after all machine learning models and algorithms are typically accessed through APIs and Python modules. What kind of oracle? Definitely not an NP-complete problem like SAT since machine learning fails miserably if you try to use it to break cryptography.

The real information in machine learning comes from the data. For a length parameter n, consider a string x which might be exponential in n. Think of x as a list of labeled or unlabeled examples of some larger set S. Machine learning creates a model M from x that tries to predict whether x is in S. Think of M as the oracle, as some compressed version of S.

Is there a computational view of M? We can appeal to Ockham's razor and consider the simplest model consistent with the data for which x as a set are random in the S that M generates. One can formalize this Minimum Description Length approach using Kolmogorov Complexity. This model is too ideal, for one it can also break cryptography, and typical deep learning models are not simple at all with sometimes millions of parameters.

This is just a start. One could try time bounds on the Kolmogorov definitions or try something different completely. Adversarial and foundational learning models might yield different kinds of oracles.

If we can figure out even a rough complexity way to understand learning, we can start to get a hold of learning's computational power and limitations, which is the purpose of studying complexity complexity in the first place.

6 comments:

Anonymous1:02 PM, November 03, 2021
A reasonable way to view "ML oracles" is through the lens of non-uniform computations.

We may think of an ML model as "advice" with the additional requirement that it can be computed from (z, L(z)) pairs where L is the language we care about and z's are chosen from some sampleable distribution (with additional constraints on |z|, the number of z's, and the time to compute the model from the "training set" {(z, L(z))}.

In some sense, notions like auto-reducibility, random self-reducibility, etc. already address the question of what it means for a language to be "learnable". We can think of ML as providing a nonadaptive self-reducibility of sorts.
ReplyDelete
Replies
Anonymous8:26 PM, November 03, 2021
Re: Anon #1 comment:

ML is not just "nonadaptive" self-reducibility, it needs to be oblivious self-reducibility -- the points on which you're given the right label should be independent of the input itself (but could possibly depend on the input length).

Thus a worst-case to average-case reduction wouldn't qualify as "ML" because the points on which the (worst-case) algorithm queries the (average-case) oracle could (and usually, do) depend on the input being decided.
ReplyDelete
Replies
Hsin-Yuan Huang2:57 PM, November 04, 2021
This is a very interesting discussion! My collaborators at Google Quantum AI and I also had this idea a year ago when we are studying quantum advantage in machine learning. We try to formalize some of the concepts in https://arxiv.org/abs/2011.01938 by defining a complexity class for problems that could be solved by "algorithms that could learn from data". The definition is probably not the best one, but it enables us to study some basic implications.

We then explore some applications for solving quantum many-body problems in https://arxiv.org/abs/2106.12627 to show that classical learning algorithms with experimental data obtained from nature could solve some interesting and challenging quantum physics problems.
ReplyDelete
Replies
EG9:08 PM, November 05, 2021
@Lance:

The analogy here is interesting ...

"machine learning fails miserably if you try to use it to break cryptography."

Fundamentally this seems like a flawed approach,
unless there's a niche area within the process
where ML could be useful.

That said, the same thing can be said about how to use ML algorithms to perform symbolic integration. Having
worked in this area while developing programming languages,
there's been some fanfare on how great some ML
algos work...
Perhaps this is a story to be told another time.
ReplyDelete
Replies
DJL8:20 AM, November 07, 2021
EG: "the same thing can be said about how to use ML algorithms to perform symbolic integration."

So that means that one question for complexity theory would be to define the class of problems for which ML makes no sense. As an off-hand first guess, perhaps it would be the class of problems for which domain causality is important.

As a jaundiced observer I'd say something along the lines of "all problems for which you care that the answer is correct"...

Of course, that latter definition would put ML outside the domain of computational complexity.
ReplyDelete
Replies
Rahul Mehta10:41 PM, November 10, 2021
Have you taken a look at any of the work around the relative expressive efficiency of shallow vs deep ConvNets (ie. https://arxiv.org/abs/1705.02302)? Something about your post reminded me of these results, but not sure about the broader structural insihts here.
ReplyDelete
Replies

Add comment