Thursday, June 04, 2015

Langs with provably bigger CFG's then CSG's

In a prior blog entry HERE I discussed very large differences in the size of machines. I didn't discuss CFG vs CSG so I'll do that now. We assume that the CFGs and CSGs are in Chomsky Normal Form (for CSG this means that RHS is any rule is of length 1 or 2).

The following are known:

1)  Let PERMn be the set of permutations of {1,...,n} (so  PERM3 is {123,132,213,231,312,321}). Then (1) there exists a CSG for PERMn of size O(n2), but (2) every CFG for PERMn requires size 2Ω(n)  . The upper bound is easy, the lower bound is by Ellul, Krawetz, Shallit, Wang Here.

2) Let Wn = { ww : |w|=n}.  Then (1) there exists a CSG for Wn of size O(n), but (2) every CFG for Wn requires size  2Ω(n). The upper bound we leave to the reader.    Filmus HERE for the lower bound.(ADDED LATER: In Comment Below Filmus (yes, the same one!) claims Wn has a CSG of size O(log n)).

3) Let Sn = { w : |w|=3n and the numbers of a's, b's, c's in w are all the same}. Then (1) there exists a CSG for Sn of size O(log n) but (2) every CFG for Wn requires size 2Ω(n). The upper bound is  from Beigel and Gasarch (If you know of an earlier source leave a polite comment.)  The lower bound is from Filmus (the ref above).(ADDED LATER- In Comment Below Filmus (yes, the same one!) corrects me, his paper did not show Sn has exp lower bound, and in fact Sn DOES have a CFG of size O(n^3)).

4)  The languages above are (informally) natural. If one goes to unnatural languages then there is a result where the CFG is GINORMOUS:  For all f  ≤T HALT, for all n, there is a lang Ln such that (1) there exists a CSG for Ln of size n, but (2) every CFG for Ln has  size ≥  f(n). First proven by Meyer, but a new proof by Beigel and Gasarch is in the paper pointed to above. (See that paper for the history.)

The results stated above are suitable for an ugrad formal lang theory course.

Are there natural langs with triple-exp or larger gap between CFG's and CSG's? There are very few techniques to get lower bounds for CFG's (the papers of Ellul et al, and Filmus, are the only ones I know) so  new techniques may be needed. Are  there even any good candidates for natural languages  with small CSG's and really really large CFG's?

1 comment:

  1. Nice results! There is a slight problem with (3), since Sn actually has CFGs of size O(n^3) using "dynamic programming". Perhaps you got confused due to a typo in my paper (in Theorem 9, n should be t).

    On the other hand, I think that Wn has a CSG of size O(log n), so you get a similar separation using the lower bound you quote in (2), which does hold.

    Finally, let me mention that the lower bounds in Ellul et al. and my paper use exactly the same technique, which is a bit surprising since I didn't know of their paper while writing mine.