## Tuesday, November 11, 2008

### Barrington's Theorem

A few weeks ago Bill complained about how hard it is to get an electronic journal version of Barrington's theorem. Though Barrington's theorem was brand spanking new and exciting when I started graduate school in 1985 and it remains one of my favorite theorems, it doesn't often get taught in many graduate complexity courses these days. So let me talk about one of the great complexity results that, in some sense, lets you count in constant space.

Formally the theorem states that polynomial-size bounded-width branching programs have the same computation power as Boolean formulas. A branching program is a directed acyclic graph where all but two nodes are labelled by a variable name and has out-degree two: one edge labelled true and the other false. The other two nodes are the terminal nodes labelled accept and reject. There is a single root of in-degree zero. From the root, one follows a path following the true node if the labelled input variable is true and false otherwise and accepting or rejecting the input based on the terminal node reached. Layer the branching program based on the distance of each node from the root and the width is the maximum number of nodes in any layer.

Barrington showed that width-five branching program can simulate any Boolean formula and thus the complexity class NC1 which includes the majority function. But the branching program in any layer can only remember one of five nodes, less than three bits of information as it processes through the graph. Yet you still can tell if a majority of input variables are true. Amazing.

The key to Barrington's proof is that the group S5 (permutations on five elements) is non-solvable i.e., there are permuatations σ = (12345) and τ = (13542) such that στσ-1τ-1 is not the identity permutation. There is a simple write-up of the proof on page 60-61 of the Boppana-Sipser survey.

Some neat consequences of Barrington's theorem.

• Regular languages are NC1-complete under projections of the inputs.
• One can compute any language in PSPACE using not much more than a clock and constant bits of memory. (Cai and Furst)

1. Sounds like branching programs are what other people call binary decision diagrams.

2. Does "polynomial-size" mean that the number of nodes and edges is polynomial in the number of variables?

3. the existence of that \sigma and \tau in S_5 just means that S_5 is not commutative. Non-solvability is less trivial than that: first, you generate the commutator subgroup H (i.e. the subgroup generated by all the expressions xyx^{-1}y^{-1}, for x,y in S_5.)
Then you generate the commutator subgroup H' of H, etc. If in the end you get a trivial group, your group is solvable. Otherwise, it is not. In fact, here H=A_5, the group of even permutations of 5 symbols, and H'=H.

4. Anyone aware of a (good) writeup of a proof of Barrington's theorem? I, too, have been unable to get a copy of Barrington's journal version, and I have not yet seen lecture notes or a textbook that covers it.

5. for the paper you can goto my course website for CMSC 752.

Bill Gasarch

6. Sounds like branching programs are what other people call binary decision diagrams.

BDD as commonly used represents only a special case of branching programs. On page 1 of his fascicle (the first page is an unnumbered page 0) Knuth makes the usual assumption that BDDs are both ordered and read-once. Barrington's theorem requires multiple accesses.

The book
Branching Programs and Binary Decision Diagrams
by Ingo Wegener (referred to on page 56 of the Knuth fascicle you cite)
goes into a lot of detail on the relationship.

7. I remember seeing this result proved in the following book:

Boolean Functions and Computation Models by Peter Clote and Evangelos Kranakis

8. Well, this is embarrassing!

Bill's been nagging me about this for a while, and with the question hitting the big time here I finally went to check and it appears that I don't have an electronic copy!. It was my first paper, and I've switched computers a few times since then, and somewhere along the line it seems that I didn't preserve stuff from MIT. I seem to have everything going back to my second paper, Barrington-Therien STOC 1987, but no STOC, JCSS, or thesis version of Barrington 1986.

As Lance pointed out, the proof is in the Boppana-Sipser survey from the Handbook of TCS, and also in the Wegener book that Paul mentioned. And JCSS is in a lot of libraries. There's even a Russian translation of the JCSS paper. But none of this is electronic!

OK, if I write it here then there will be at least one proof on the internet. Define an instruction over S_5 to be a triple (i,sigma,tau) where i is the index of one of the n boolean inputs and sigma and tau are permutations in S_5. The instruction (i,sigma,tau) evaluates to sigma if x_i is true and to tau if x_i is false. A program is a sequence of instructions, and it evaluations to the product of the values of its instructions. The theorem is that if you have a boolean circuit C of fan-in two and depth d, and sigma is any 5-cycle in S_5, there is a program of length at most 4^d that evaluates to sigma if C evaluates to true and to the identity if C evaluates to false. It should be obvious how to convert the program into a width-5 branching program as Lance defined. (My real contribution, I think, was to define this "M-program" model that captures all the interesting power of constant-width branching programs and can be looked at algebraically.)

Base case: C is an input or negated input, you can make a program of one instruction that evaluates to just what you want.

Independence with respect to sigma: If sigma and tau are both five-cycles and you have a program that evaluates to sigma if C is true and to e if C is false, find a permuation gamma so that tau is
gamma times sigma times gamma^{-1}. Then multiply both permutations in the first instruction by gamma on the left, and both permutations in the last instruction by gamma^{-1} on the right. The new program has the same length as the old but produces tau if C is true and e if C is false.

Negation case: If program P evaluates to sigma if C is true and to e if C is false, multiply both permutations of the last instruction on the right by sigma^{-1}. The new program has the same length as the old and evaluates to e if C is true and to sigma^{-1} (a five-cycle) if C is false. By the previous lemma you can replace sigma^{-1} by any desired five-cycle.

Induction case for AND (OR is not needed because you have negation already): Given circuits C and D you want a program that evaluates to sigma if C and D are both true and to e otherwise. By the inductive hypothesis you have programs P, P', Q, and Q' of length at most 4^d each such that:

P evaluates to (12345) if C is true, to e otherwise.

Q evaluates to (13542) if D is true, to e otherwise.

P' evaluates to (54321) if C is true, to e otherwise.

Q' evalautes to (24531) if D is true, to e otherwise.

You can check that PQP'Q' evaluates to (13254) if C and D are both true, and to e otherwise. By the lemma above, this five-cycle is as good as any other, and the length of this program is at most 4^{d+1}.

Commenter dimpase's correction of Lance is right. The specific fact about S_5 used in the proof is that there are two elements that are conjugate to each other and to their commutator. This can only happen in a non-solvable group. The proof easily generalizes to programs over any non-solvable group -- it's probably simplest algebraically if G is a non-abelian simple group, and your inductive hypothesis is that for any circuit C of depth d and for any non-identity element sigma of G, you have a program of length at most c^d that evaluates to sigma if C is true and to e if C is false.

9. The PDF link on Bill's site is actually to the JCSS paper rather than the conference version as Bill says on the linking page. Thanks for putting this up, Bill, and sorry that I've been remiss in dealing with this. Lance's link is also to a PDF of the book chapter. So the proof I just posted is redundant, but it might be useful as a more compressed "elevator conversation" version.

Commenter rgrig: Yes, the size of a branching program is the number of nodes, and in a constant-width program this is linear in the number of edges and in the number of levels. The latter is the "length" of one of my algebraic programs.

Paul is of course right about BDD's versus branching programs. As I said in the paper, the model was originally due to C. Y. Lee in 1959, who called them "binary decision programs". (It's sometimes credited to Masek's 1976 MIT master's thesis, but Masek references Lee, and Lee only references Shannon 1948 as a competing model.)

10. Here is another on-line proof of Barrington's Theorem: http://www.cs.rutgers.edu/~allender/538/notes.html