## Monday, October 10, 2005

### Favorite Theorems: Logical Characterization of NP

September Edition

Usually we think of the class NP as either languages accepted in polynomial-time by a nondeterministic Turing machine or languages with polynomial-time verifiable witnesses. Ronald Fagin gives a characterization of NP based on logic without references to Turing machines or polynomial time.

Ronald Fagin, Generalized first-order spectra and polynomial-time recognizable sets. Complexity of Computation, ed. R. Karp, SIAM-AMS Proceedings 7, 1974, pp. 43-73.

In this paper Fagin shows that NP consists of exactly the languages expressible with existential second-order formulas. For example consider a graph G described by an edge relation E(i,j) and we can define whether G is k-colorable by

∃C ∀i,j (1≤C(i)≤k ∧ (E(i,j)→C(i)≠C(j)))

With some more work you can use binary predicates. In general every language in NP has an existential second-order characterization with binary predicates and a universal first-order part.

Stockmeyer generalizes Fagin's result to characterize the polynomial-time hierarchy with second-order formulas.

Fagin's result started the area of descriptive complexity that characterized many common complexity classes in various logics and has connections to the complexity of database queries. Neil Immerman's work in descriptive complexity led him to his proof that nondeterministic space is closed under complement. Robert Szelecpsényi independently came up with a similar proof through a different approach.

Papadimitriou and Yannakakis use Fagin's result to characterize the class MAX-SNP of optimization problems. One of the first corollaries of the PCP Theorem is to show the MAX-SNP hard problems cannot be approximated within an arbitrary constant unless P=NP. In fact the concept of probabilistically checkable proof itself originally comes from a second-order view of NP that originated from Fagin's paper.

Thanks to Lance and Siva for the kind words about my theorem. Let me clarify the story on the arity of the existentially quantified relations.

An existential second-order formula about, say, graphs, is a formula of the form

∃ Q1 ... ∃ Qk S( E, Q1, ..., Qk)

where E represents the edge relation, Q1, ..., Qk are existentially quantified predicates of arbitrary arity, and S(E, Q1, ..., Qk) is a first-order formula that involves only E, Q1, ..., Qk. As an example, 3-colorability of the graph can be expressed by an existential second-order formula

∃ Q1 ∃ Q2 ∃ Q3 S(E, Q1, Q2, Q3),

where Q1, Q2, and Q3 are unary predicates (that represent the 3 colors), and S(E, Q1, Q2, Q3) is a first-order formula that says "Each point has exactly one color, and no two points with the same color are connected by an edge''.

In the case of graphs, my theorem says that if T is a class of graphs that is closed under isomorphism (that is, whenever T contains a graph G, then it contains every graph isomorphic to G) , then T is in NP if and only if T is defined by an existential second-order formula. In the case of graphs, it is an open problem as to whether we really need to allow existentially quantified predicates of arbitrary arity. On the one hand, it is conceivable that there are NP properties of graphs that require existentially quantified predicates of arbitrarily large arity. On the other hand, it is conceivable that we can capture every NP property of graphs by allowing only a single existentially quantified binary predicate.

If we consider NP properties not just of graphs, but of arbitrary structures (such as structures with, say, two ternary relations and five 7-ary relations), then the characterization of NP in my theorem continues to hold, but in this case, it is known that existentially quantified binary predicates do not suffice. In particular, Ajtai proved (in the same amazingly rich 1983 paper [Σ11-formulae on finite structures, Annals of Pure and Applied Logic 24, 1983, pp. 1-48] where, among other things, he proved the Furst-Saxe-Sipser theorem independently of Furst, Saxe and Sipser), that if we consider structures consisting of a single m-ary relation, then the statement "The number of m-tuples in the relation is even" cannot be captured in existential second-order logic with any number of existentially quantified predicates of arity less than m.

A gentle introduction to the work I've mentioned in this note and to some other work in finite model theory appears in my survey paper Finite model theory-a personal perspective.

1. (1) As far as I know, it is an open question whether every NP langauge has an existential second-order characterization with binary predicates only. See, for example, the talk given by Ron Fagin at a BATS meeting .

In fact, it is conceivable AFAIK that every NP language has a second-order characterization with just one binary quantifier.

(2) To me, Fagin's theorem is perhaps the best argument we have for why NP is such an important class (aside from the empirical ones). NP is about searching for "solutions" to "problems", and as humans, we are quite limited in our ability to express what constitutes a "solution". The description "solutions that can be verified by a polynomial-time algorithm" isn't even close to what I believe to be capable of human expressibility. On the other hand, in virtually every real example, the "verifiability" of a purported solution can be distilled to a simple first-order sentence (e.g., if you want to produce a time-table for your university, you're looking for a relation (class, professor, room, timeslot) that satisfies natural constraints like "no professor can be present in two classrooms at same timeslot", "no room can hold two classes as the same time", etc.).

A nice counterexample to my claim above is factoring: I can't think of a simple first-order sentence that will tell us if N = p * q. With the logical structure most natural for the problem (integers, bits, etc.), it is likely that this is provably impossible (by reduction from PARITY). Yet factoring is in NP.

Btw, Fagin's theorem, as stated by Lance, might be incorrect; this business is a bit picky about the structure (graphs, numbers, etc.) underlying the logic you're talking about. (I am sure Ron will point out the subtleties very soon :-)

2. Apologies for the bad URL for the talk by Ron Fagin. It is
here

3. 4. 5. 