For this post all Context Free Grammars (henceforth CFGs) are assumed to be in Chomsky Normal Form. The size of a CFG \(G\) is the number of rules. We denote this by \(|G|\).

BILL: In my automata theory class I want to do some lower bounds on the size of CFGs. It is easy to show that if \(w=0^n\) then there is a CFG G such that \(L(G)=\{w\}\) and \(|G|=O(\log n)\). I showed that if \(w\) is a Kolmogorov random string of length \(n\), and G is a CFG such that \(L(G)=\{w\}\), then \( |G|=\Omega(n/\log n\)), though this is surely known. So here is my question: Is there a* natural *such \(w\)? I will blog about that and make an open problems column out of it.

LANCE: Kolmogorov strings are natural!

BILL: Oh yeah. If that was true then spell check would not flag *Kolmogorov* as being misspelled. So there!

LANCE: Can you ask a more rigorous question?

BILL: Okay. We can view the Kolm-result as saying that there is a function \(f\) from \(1^*\) to \(\{0,1\}^*\) such that \(f(1^n)\) is a string of length \(n\) such that any CFG for \( \{w\}\) is large. But the function f is not computable!

LANCE: That shouldn't bother you. You wrote an entire book about how many queries to HALT and other incomputable sets are needed to solve certain problems (see here). Also now that you know you there are such strings, you can simply search for a w and test all small CFGs. So Computable!

BILL: Still not natural. And what is the complexity? Exponential? Poly?

WE DROP THE TOPIC AND PICK IT UP AGAIN A FEW TIMES. WE (meaning mostly Lance) HAVE SOME BRILLIANT INSIGHTS THAT LEAD TO THE FOLLOWING RESULTS:

1) For every \(w \in \{0,1\}^n\) there is a CFG G with \(L(G)=\{w\}\) and \( |G|=O(n/\log n)\)

2) If \(w\) is a de Bruijn sequence of length \(n\) and order \(k=\log n\) (we assume n is a power of 2). Then every CFG G with \(L(G)=\{w\}\) has \( |G|=\Omega(n/\log n)\). There is a known algorithm that will, given \(1^n\), produce a de Bruijn sequence or length n and order \(k=\log n\), in time quasilinear in \(n\).

BILL: That bums me out for two contradictory reasons

a) The problem is NOT solved since *de Bruijn *is flagged by spellcheck, so the sequences are not natural.

b) The problems IS solved, so I can't use it for an open problems column.

LANCE: Do not despair!

a) De Bruijn sequences have a Wikipedia page and therefore are natural.

b) We can post on ArXiv.

WE DID and a day later Markus Lohrey emailed us that, aside from the De Bruijn result, the results are already known using a different terminology, word chains. See his survey here. Then the next day, Giovanni Pighizzini emails us that he had previously published lower bounds for De Bruijn sequences. We have since withdrawn the paper. We revised it by putting in references and history but will not put it on arxiv. The revised paper is here.

LANCE: Bill, are you bummed out? Why did we even write the paper anyway?

BILL: Not at all! My original goal was pedagogical, and the paper we have can still be taught in automata theory next spring. PLUS, we got invited to submit to *Advanced in* *AI and ML* with a 10% discount on publication fees (see here.) Since we are used to getting 100% discount on publication fees we won't be submitting, but it was nice to be asked.

LANCE: Yeah, nice to be asked to be parted from my money. At least I learned about word chains.