## Wednesday, January 25, 2006

### A Theorem that should be better known

GUEST BLOGGER: Bill Gasarch

(BEFORE I START TODAYS BLOG- A REQUEST.  EMAIL ME OTHER
LUDDITE QUESTIONS- I WILL POST THE BEST ONES ON FRIDAY)

If u,v \in \Sigma^* then u is a SUBSEQUENCE OF v if you
can obtain u by taking v and removing any letters you like.

EXAMPLE: if v= 10010  then
e,0,1,00,01,10,11,000,001,110,0010,1000,1001,1010,10010
are all of its subsequences

Let L be any language-- a subset of \Sigma^* SUBSEQ(L)
is the set of subsequences of all of the strings in L.

The following three could be easy problems in a
course in automata theory:

a) Show that if L is regular then SUBSEQ(L) is regular

b) Show that if L is context free then SUBSEQ(L) is context free

c) Show that if L is c.e. then SUBSEQ(L) is c.e.
(NOTE- c.e. is computably enumerable- what used to be called
r.e.- recursively enumerable)

Note that the following is not on the list:

Show that if L is DECIDABLE then SUBSEQ(L) is Decidable.

Is this even true?  Its certainly not obvious.

There is a theorem due to Higman (1952), (actually a corollary of
what he did) which we will call SUBSEQ THEOREM:

If L is ANY LANGUAGE WHATSOEVER over ANY FINITE ALPHABET
then SUBSEQ(L) is regular.

This is a wonderful theorem that seems to NOT be that well known.
It's in very few Automata theory texts.  It is not heard much.
It falls out of well quasi order theory, but papers in that
area (is that even an area?) don't seem to mention it much.

This SEEMS to be an INTERESTING theorem that should get more
attention, which is why I wrote this blog.  Also, I should point
out that I am working on a paper (with Steve Fenner and Brian
Why do some theorems get attention and some do not?

1) If a theorem lets you really DO something, it gets attention.
There has never been a case of OH, how do I prove L is regular?
WOW- its the subseq language of L' !!'
By contrast, the Graph Minor Theorem, also part of well quasi
order theory, lets you PROVE things you could not prove before.

2) If a theorem's proof is easy to explain, it gets attention.
The SUBSEQ theorem needs well quasi order theory to explain.
(needs' is too strong- Steve Fenner has a prove of the |\Sigma|=2
case that does not need wqo theory, but is LOOOOOOOOOOOOOONG.
He things he can do a proof for the |\Sigma|=3 case, but that will be
LOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOONG.
Can be explained to an ugrad but you are better off going through
wqo theory.)

3) If a theorem CONNECTS to other concepts, its gets attention.
There are no real consequences of the SUBSEQ theorem.
Nor did it inspire new math to prove it.

4) If a theorem has a CHAMPION it may get attention.  For example
the SUBSEQ Theorem is not in Hopcroft-Ullman's book on automata
theory- one of the earliest books (chicken and egg problem- its
not well known because its not in Hopcroft-Ulman, its not in HU
because its not well known). The SUBSEQ theorem had no CHAMPION.

5) Timing.  Higman did not state his theorem in terms of regular
languages, so the CS community (such as it was in 1952) could not
really appreciate it anyway.

Yet, it still seems like the statement of it should be in automata
theory texts NOW.  And people should just know that it is true.

Are there other theorems that you think are interesting and not
as well known as they should be? If so I INVITE you to post them
SHOULD BE BETTER KNOWN will then become better known and hence
NOT be the winner, or the loser, or whatever.

NOTE: The |\Sigma|=1 case of Higman's theorem CAN be asked in
an automata theory course and answered by a good student.

1. Is there any reason why everyone wants to call recursive anything computable anything beyond the hope that sticking in the word computation will make more people interested?

2. Is there any reason why everyone wants to call recursive anything computable anything beyond the hope that sticking in the word computation will make more people interested?

Yes indeed. The word "computable" describes much more closely the objects referred to than does "recursive," which historically refers only to a partular model of computation, and nowadays is too often confused by students with a particular strategy for designing algorithms. It's incongruous to talk about Turing machines and call the functions they compute "recursive." In what sense do Turing machines recurse?

3. Steve Fenner has a proof of the |\Sigma|=2
case that does not need wqo theory, but is LOOOOOOOOOOOONG.
He thinks he can do a proof for the |\Sigma|=3 case, but that will be LOOOOOOOOOOOOOOOOOOOOOOOOONG.''

Actually I have a proof of the general case, not using wqo's, that's 4-5 pages. It is at
http://www.cse.sc.edu/~fenner/
papers/higman.pdf
.
The binary case is easier than the general case, but the ternary case probably isn't.

4. actually I had written about higman's lemma in this post, but not using the formulation of Higman's lemma that you state here, and which is much more compelling than the variant that I used. We even used it in a paper (referenced in the post).

5. Bill, your challenge is awesome.

Given a sequence of real numbers a(1),...,a(n), suppose we want to find the monotonically nondecreasing sequence that best approximates it in the least-squares norm.

It turns out there's a beautiful linear-time algorithm to accomplish this. I was elated to come up with it as a summer student at Bell Labs, until I learned that Kruskal had beat me by ~35 years.

(1) Create a linked list, where initially the ith element has a "value" of a(i) and a "weight" of 1.

(2) Repeatedly look for adjacent elements i and i+1 such that a(i)>a(i+1). Whenever you find such a pair, replace it by a single element of weight w(i)+w(i+1), and value equal to the weighted average
[w(i)a(i)+w(i+1)a(i+1)]/[w(i)+w(i+1)].
Continue until a(i)<=a(i+1) for all i.

(3) Output a list of n elements, where a(i) in the final list occurs with multiplicity w(i).

Exercises: Why does this work? Why can it be made to run in linear time?

6. OK, I've got another result that ought to be better-known in our community (though it is well-known in a different community).

Over a Boolean alphabet, what are the largest sets of gates that are not universal? Assuming the constants 0 and 1 come for free, it's easy to show that there are exactly two such sets:

(1) the monotone gates (AND,OR), and

(2) the linear gates (NOT,XOR).

But what if the alphabet has 3 or more elements? Then the problem is much more complicated, but it was solved by Ivo Rosenberg in the early 70's. In particular, Rosenberg showed that for any finite alphabet size, there are only finitely many "maximal but not universal" gate sets.

7. This comment has been removed by a blog administrator.

8. (edited post)

Thanks for the cool pointer, Bill.

Am I missing something? I found a proof of Higman's Lemma pretty quickly. It uses Dickson's Lemma, which now that I've looked up the terms is I guess part of w.q.o. theory, but that result has an easy, self-contained proof by induction (I learned about it in week 2 of an undergrad alg. geometry course) and is beautiful discrete math. So I'm not sure why Higman's result can't be in more texts.

Dickson's Lemma: Let S be a subset of the set of
k-tuples of natural numbers Suppose that S is 'upwards closed': if v1 is in S and v2 dominates v1 coordinate-by-coordinate, v2 is in S.

Then there's a finite subset S' of S such that v is in S iff v dominates some element of S'.

Proof is induction on k.

Proof of Higman's lemma:

Let L be a language. If every string is in subseq(L), subseq(L) is decidable; so say x is a forbidden subsequence for L.

Insert 0's and 1's into x so that 0's and 1's alternate; the resulting string x' is also forbidden. Let k be the length of x'; than no string in L can have more than k alternations between 0 and 1.

Slice up the (k+1)-alternation-restricted strings according to how many 0-1 alternations (0 <= j < k+1) a string has and which bit (b) it begins with.

Any of the strings in the (j, b) slice can be naturally encoded as a (j+1)-tuple of natural numbers in a bijective way (for that slice), e.g.

000111011 ---> (3, 3, 1, 2) in the (3, 0) slice;
11011 -----> (2, 1, 2) in the (2, 1) slice;
11------> (2) in the (0, 1) slice.

Then it holds that if any (j+1)-tuple v1 encodes a forbidden (j, b) subsequence of L and v2 dominates v1, v2 also encodes a (j, b) forbidden subsequence. Thus by Dickson's Lemma, the forbidden (j, b) strings are exactly those
j-alternation-restricted strings whose encodings dominate the encoding of one of a finite set of (j, b) strings. This is a finite disjunction of properties easy to test by finite automata; using the closure of regular languages under finite union and complement, and applying the easy check for too many 0-1 alternations, we find subseq(L) is regular. QED

This does beg the question of how to provide the finitely many strings we need, given a description of L (Dickson's Lemma is nonconstructive). But of course it's undecidable to do this given just a machine for L, and in any case, as Bill says, who actually cares about subseq(L)?

9. I guess Suresh alludes on his post to a proof-technique that is much the same; I just want to argue that it's not arcane.

10. Scott, is the second result you reference (which I agree is very cool) also naturally in the orbit of wqo/Robertson-Seymour type results?

11. This comment has been removed by a blog administrator.

12. (I feel odd commenting on my own post.)
YES, the proof given above of SUBSEQ Thm.
using Dickson's lemma is correct.
In fact, the proof of SUBSEQ theorem is
NOT hard. I suspect that your proof
and the standard one are the same
same proof. When I say it needs
`wqo theory' that just means that it
would take some work to get to in an
ugrad automata theory class, but it
really could be done. And it could be
in the textbooks- would not take that
many pages.

bill g.

13. Am I missing something? I found a proof of Higman's Lemma pretty quickly. It uses Dickson's Lemma ...

This is a good concise proof, but only of the binary case of Higman's result. It resembles some sort of hybrid between Higman's proof and mine (see the link in my previous comment). Dickson's Lemma is essentially a restatement of the fact that

(N^k, componentwise-domination)

is a wqo, and that part resembles Higman's proof. The question of whether SUBSEQ(L) has an excluded string is equivalent to that of whether strings in L have unbounded 0-1 alternation. I generalize this idea to prove the general case for a k-ary alphabet.

(Higman's full proof uses the fact that (Sigma*, subseq) is a wqo, for any finite alphabet Sigma. Once this is established, the rest of the proof is easy and straightforward.)

By the way, I wasn't deliberately trying to avoid wqo's or Dickson's Lemma in my own proof. I just didn't know about them at the time (although I knew I was reproving a known result).

Finally, I can imagine a scenario where Higman's result is useful: a language L may be obviously closed downward under the subseq relation, but not obviously regular. Higman's result says that L = SUBSEQ(L), so L is regular. For example,

L = {w in {a,b,c}* | w has at most 5 occurrences of a followed by b, and at most 3 of them have c in between}

Of course, what is obvious and not obvious is in the eye of the beholder.

14. OK, thanks. I actually just didn't notice that Bill had actually stated a k-ary generalization of what I proved (typical CS lacuna--expecting that binary alphabets always capture the essential complexity). I'll think about k > 2.

15. "Scott, is the second result you reference (which I agree is very cool) also naturally in the orbit of wqo/Robertson-Seymour type results?"

I don't think so (but I could be wrong).

16. Well, here's what I think I was reaching for.

Let G be a gate-set; let F(G) be the functions computable with G. Form gate-set equivalence classes:
[G] = {G': F(G') = F(G)}.

Partial-order these classes:
[G1] <= [G2] if F(G2) contains F(G1).
(not just quasi- because it's antisym.)

Suppose it turns out to be a well-partial-ordering; then looking at the set of equivalence classes of the maximal non-universal gate sets, they form an antichain. So there must be only finitely many of them.

This falls short of the result you quote, because one of these function classes might have infinitely many maximal basis gate-sets. Still, it's part-way there.

17. To clarify my words: the 'form an antichain' claim is not dependent on the partial ordering being well-. Only the finitude claim is.

18. This theorem appear on page 64 of John Conway's book "Regular Algebra and Finite Machines". I am not an expert in this area but it seems to me that this text on finite automata contains lots of material which is fundamental to automata theory but has not been explored since the book was written in in the early 1970s. This is despite the book being cited in many papers in the computer science literature. I guess what I am trying to say is that if you want results that should be better known go and read Conway's little book!

19. Imagine my surprise when my co-authors (Cortes and Mohri, "Learning
Linearly Separable Languages") pointed out that another paper in ALT06
-- Fenner and Gasarch, "The Complexity of Learning SUBSEQ(A)" -- was
using Higman's result. Imagine Steve's and my surprise when it turned
out that a third paper at that same conference was also using Higman's
theorem: de Brecht and Yamamoto, "Mind Change Complexity of Inferring
Unbounded Unions of Pattern Languages From Positive Data". Perhaps the
time was ripe for this obscure result to come start getting
mileage... or perhaps it wasn't that obscure to begin with!

-Leo

20. There IS a recent text-book that contains Higman's Lemma together with a short proof of it (one page): the book of Reinhard Diestel, "Graph Theory". It is stated there in the context of the graph minor theorem.

21. I believe that I, too, have found a simple proof of Higman' Theorem. My point of view, and the application I have in mind, seems to ask for a different statement and a slightly stronger version: Given a finite alphabet \Sigma, if we add permissible subsequences indefinitely, we must eventually terminate at /Sigma*. I haven't fully read anyone else's proof, to develop my own ideas freely (and also because I'm lazy about reading proofs), but I suspect most or all other proofs actually prove the same thing. I will eventually read at least one other proof, but I have a query: Am I right that other proofs also prove the above stronger statement?

I have a question and a recommendation for Steve specifically. I have a copy of your manuscript. Where has or will it appear? My recommendation is: You avoid well quasi-orders, but you do use ordinary well orders, at least implicitly, and perhaps you should say so in the manuscript.

22. The paper appeared in THEORY OF COMPUTING SYSTEMS Vol 45, No. 3, in 2009.

They made us take out the appendix which had alt proof of Higmans lemma; however,
the version on my website has those appendices.

I doubt Steve Fenner will read this blog
from 2006- you should email him directly.

23. This comment has been removed by the author.