Wednesday, September 11, 2024

Natural Proofs is Not the Barrier You Think It Is

If there's a position where I differ from most other complexity theorists it's that I don't believe that natural proofs present a significant barrier to proving circuit results. I wrote about this before but I buried the lead and nobody noticed.

Let's review natural proofs. I'll give a very high-level description. Li-Yang Tan gives a good technical description or you could read the original Razborov-Rudich paper. A natural proof to prove lower bounds against a circuit class \(\cal C\) consists of a collection \(C_n\) of Boolean functions on \(n\) inputs such that

  1. No polynomial-size circuit family from \(\cal C\) can compute an element of \(C_n\) for large enough \(n\). 
  2. \(C_n\) is a large fraction of all the function on \(n\) inputs.
  3. A large subset of \(C_n\) is constructive--given the truth-table of a function, you can determine whether it sits in the subset in time polynomial in the length of the truth-table. Note: This is a different than the usual notion of "constructive proof". 
The natural proof theorem states that if all three conditions hold than you can break pseudorandom generators and one-way functions.

My problem is with the third property, constructivity. I haven't seen good reasons why a proof should be constructive. When I saw Rudich give an early talk on the paper, he both had to change the definition of constructivity (allowing subsets instead of requiring an algorithm for \(C_n\) itself) and needed to give heavily modified proofs of old theorems to make them constructive. Nothing natural about it. Compare this to the often maligned relativization where most proofs in complexity relativize without any changes.

Even Razborov and Rudich acknowledge they don't have a good argument for constructivity.
We do not have any formal evidence for constructivity, but from experience it is plausible to say that we do not yet understand the mathematics of \(C_n\) outside exponential time (as a function of \(n\)) well enough to use them effectively in a combinatorial style proof.
Let's call a proof semi-natural if conditions (1) and (2) hold. If you have a semi-natural proof you get the following implication. 

Constructivity \(\Rightarrow\) One-way Functions Fail

In other words, you still get the lower bound, just with the caveat that if an algorithm exists for the property than an algorithm exists to break a one-way function. You still get the lower bound, but you are not breaking a one-way function, just showing that recognizing the proofs would be as hard as breaking one-way functions. An algorithm begets another algorithm. You don't have to determine constructivity either way to get the lower bound. 

Even if they aren't a great barrier to circuit lower bounds, natural proofs can be an interesting, if badly named, concept in their own right. For example the Carmosino-Impagliazzo-Kabanets-Kolokolova paper Learning Algorithms from Natural Proofs.

So if I don't believe in the barrier, why are circuit lower bounds hard? In recent years, we've seen the surprising power of neural nets which roughly correspond to the complexity class TC\(^0\), and we simply don't know how to prove lower bounds against powerful computational models. Blame our limited ability to understand computation, not a natural proof barrier that really isn't there.

6 comments:

  1. For a completely opposite viewpoint on the role of constructivity, with formal theorems to back it up, see:

    "Natural Proofs Versus Derandomization". STOC 2013, SICOMP 2016

    ReplyDelete
    Replies
    1. I view Ryan's Paper more as a dual viewpoint: You get constructivity if you don't require largeness.

      Delete
  2. I've been thinking about non-constructive bounds related to detecting cliques [1]. Specifically, consider detecting just some of the k-vertex cliques in an n-vertex graph. Then, modify Shannon's counting argument...

    When I first started thinking about this, I thought:

    1. Someone must have thought about this before, as it's a small tweak of Shannon's argument. (I haven't found it in the complexity literature; then again, I haven't read that much of said literature.)

    2. Whoa -- if detecting half of the cliques is somewhat difficult (at least on average). Now, if only we can just show that detecting all of the cliques is harder... That'll be easy to prove, right?

    Narrator: it wasn't. (Yes, i was being naive...)

    This highlights that "$f$ is a buggy clique detector" is a pretty non-constructive property. (It meets the RR definition of "constructive"; to me, that argues that "polynomial in truth table size" is a pretty lenient definition of "constructive".)

    One nice thing is that the set of "buggy clique detectors" (for $m$ input edges) is comfortably less than the number of possible functions (with $m$ inputs). Thus, it dodges the "largeness" condition. (And indeed the paper cites Ryan Williams' post here [2], suggesting that avoiding "largeness" might be a useful plan.)

    Unfortunately, as I said, it's pretty non-constructive, and so doesn't seem to get anywhere.

    [1] Burdick J. How hard is it to detect some cliques? ACM SIGACT News. 2024 Jun 18;55(2):38-52.
    https://doi.org/10.1145/3674159.3674163

    [2] Ryan Williams. Comment on Computational Complexity blog, 2007. Accessed on July 8, 2023.
    https://blog.computationalcomplexity.org/2007/05/godel-prize-natural-proofs-my-2-cents.html

    ReplyDelete
  3. (part 1)

    My mental model of "natural proofs", especially "constructivity", is the following. (In fact, I feel that this is the property that makes natural proofs, well, natural.) I take constructivity as a measure of the sophistication of the argument we need to prove a lower bound – measured as the computational complexity of the argument.

    Say we wish to show a lower bound against a class C of circuits. This usually goes in two or three stages. We have some intuition about what limits C from computing interesting functions, why our efforts to find a C-algorithm for problems at the boundary of knowledge are failing, and so on. We turn this into two things:

    (1) an explicit function f that we conjecture that C cannot compute; and
    (2) an explicit reason why C-circuits cannot compute f.

    The first part (identifying f) is often "easier", since we're blessed with a large catalog of interesting problems that we're constantly placing within the hierarchies of complexity classes based on time, space, circuit size/depth, etc.

    The second part (the "reason") is often trickier, and we do this by identifying properties of functions that C-circuits can compute. This process is often highly messy, creative, and technical.

    An early example illustrates this point very well: when the first lower bounds against AC0 happened (Ajtai; Furst-Saxe-Sipser), the intuition was that AC0 computations are too "insensitive" (they remain constant in many large sub-cubes), and a function like Parity cannot, therefore, be computed by AC0 circuits. Translating this into a proper technical statement was the work of the famous Switching Lemmas; a particularly nice way to think about it emerges from the work of Linial, Mansour, and Nisan: functions in AC0 have low average sensitivity (if you pick a random input and count how many of its neighbors' values differ from it, on average, it's bounded by polylog(n), where the exponent of the polylog is linearly related to the circuit depth). And, of course, parity is super-sensitive, across every edge of the hypercube.

    Now consider the property P(g), where g is a Boolean function on n variables given by its truth table, that asserts that g does not have low average sensitivity. The property P can be computed by a circuit of size polynomial in 2^n and depth 3 or 4 that does the straightforward exhaustive check (technically, we'd set up P so that it's sufficient to count the "average" approximately for this).

    This is the constructivity part. It was made possible because when we developed our intuitions about AC0 computations, we instinctively think somewhat algorithmically. Our thinking is limited by our brain power, and isn't very sophisticated – hence the algorithm pops out almost naturally.

    That is, I take constructivity as the argument that we're not that smart: our subtle insights about (functions computable by a) complexity class C are actually quite natural, and hence algorithmically verifiable, given the truth table of a function. The foregoing example of "insensitivity" was ultra-computable (by a constant-depth circuit); Razborov and Rudich give other examples that require a bit more work (that is, NC- or P-algorithms) to show constructivity, but the properties themselves are somewhat, erm, natural: "can't be approximated by low-degree polynomial"; "can't be approximated by the sign of a low-degree polynomial"; and so on.

    (to be continued...)

    ReplyDelete
  4. (part 2)

    How much "cleverer" do we need to get in this game against non-trivial classes of computation? That is, what are examples of lower bounds where the computational complexity of the magical property P is super-polynomial (at least on the face of it)?

    Here's one: Ajtai proved (FOCS 1999) an exponential size lower bound on linear-time branching programs: for all positive integers k and for all sufficiently small ε > 0, if n is sufficiently large then there is no Boolean branching program of size less than 2^{εn} which, for all inputs X ⊆ {0,1,...,n−1} (encoded as 0-1 strings of length n), computes in time kn the parity of the number of elements of the set of all pairs (x, y) with the property x ∈ X, y ∈ X, x < y, x+y ∈ X.

    Ajtai's proof is, as usual, a work of art (see the ToC version at https://theoryofcomputing.org/articles/v001a008/v001a008.pdf). Reading through the proof overview (Section 2), you'll see the complexity of Ajtai's reasoning against linear-time sub-exponential size branching programs: in particular, see Lemma 3.5 (which is from an earlier paper of Ajtai's). Ajtai notes that the proof shows that if a function f can be computed in linear time with the given restrictions on the size then there are two large disjoint subsets, W1,W2, of the set of the input variables and an input χ so that for each i = 1,2 we may change the input χ in many different ways by changing only the values of the variables in Wi so that the output does not change; moreover these changes can be performed simultaneously on W1 and W2 so that the output still does not change. The ratio between the sizes of the sets Wi and the logarithm of the number of changes has a crucial importance in the proofs of the [FOCS 1999 / ToC] paper.

    We may ask: how complex is the above property? It is, after all, a type of insensitivity. Given the truth table of a function f, can we tell if such subsets W1 and W2 and input X exist? The exhaustive search over these is straightforward in 2^{O(n)} time, but Property (3) [on page 157] of Lemma 3.5 that talks about the "many different ways" alluded to above seems like a killer. It asserts that there are 2^{Omega(n)} different settings of the variables in W1 and W2, and exhaustively enumerating them is beyond the scope of a polynomial (in 2^n) time algorithm. It's certainly "NP-natural" since the above property can be verified in nondeterministic polynomial time by guessing those inputs.

    Incidentally, largeness is straightforward for this property – a random function almost surely won't have such structure.

    To conclude this comment about constructivity, I'll note that unlike Lance, I see this as the central difficulty in proving circuit lower bounds: our language for lower bounds (largely combinatorial, some algebraic, over the recent decades) is too crude to reason in, and we might have to develop higher-level abstractions – not just traditional logic, topology, and analysis, but "TCS versions" of those – certainly there are several bright spots, e.g., Fourier analysis, high-dimensional expanders, meta-complexity, etc., but it feels like there's still plenty of way to go where we can formulate sophisticated properties about computation naturally (even if the constructivity (computational complexity) for those properties is PSPACE-complete or worse.

    ReplyDelete
  5. Thank you LapsedTheorist for adding valuable (historical) context and summarizing the approaches with minimal technical jargon and prerequisites.

    ReplyDelete