## Sunday, June 09, 2024

### CFG-Kolm-complexity is singleton sets with Lance and Bill

For this post all Context Free Grammars (henceforth CFGs) are assumed to be in Chomsky Normal Form. The size of a CFG $$G$$  is the number of rules. We denote this by $$|G|$$.

BILL: In my automata theory class I want to do some lower bounds on the size of CFGs. It is easy to show that if   $$w=0^n$$ then there is a CFG G such that $$L(G)=\{w\}$$ and $$|G|=O(\log n)$$. I showed that if $$w$$ is a Kolmogorov random string of length $$n$$, and G is a CFG such that $$L(G)=\{w\}$$, then $$|G|=\Omega(n/\log n$$), though this is surely known. So here is my question: Is there a natural such $$w$$? I will blog about that and make an open problems column out of it.

LANCE: Kolmogorov strings are natural!

BILL: Oh yeah. If that was true then spell check would not flag Kolmogorov as being misspelled.  So there!

LANCE: Can you ask a more rigorous question?

BILL: Okay. We can view the Kolm-result as saying that there is a function $$f$$ from $$1^*$$ to $$\{0,1\}^*$$ such that  $$f(1^n)$$ is a string of length $$n$$ such that any CFG for $$\{w\}$$ is large. But the function f is not computable!

LANCE: That shouldn't bother you. You wrote an entire book about how many queries to HALT and other incomputable sets are needed to solve certain problems (see here).  Also now that you know you there are such strings, you can simply search for a w and test all small CFGs. So Computable!

BILL: Still not natural. And what is the complexity? Exponential? Poly?

WE DROP THE TOPIC AND PICK IT UP AGAIN A FEW TIMES. WE (meaning mostly Lance) HAVE SOME BRILLIANT INSIGHTS THAT LEAD TO THE FOLLOWING RESULTS:

1) For every $$w \in \{0,1\}^n$$ there is a CFG G with $$L(G)=\{w\}$$ and $$|G|=O(n/\log n)$$

2) If  $$w$$ is a de Bruijn sequence of length $$n$$ and order $$k=\log n$$ (we assume n is a power of 2). Then every CFG G with $$L(G)=\{w\}$$ has $$|G|=\Omega(n/\log n)$$.  There is a known algorithm that will, given $$1^n$$, produce a de Bruijn sequence or length n and order $$k=\log n$$, in time quasilinear in $$n$$.

BILL: That bums me out for two contradictory reasons

a) The problem is NOT solved since de Bruijn is flagged by spellcheck, so the sequences are not natural.

b) The problems IS solved, so I can't use it for an open problems column.

LANCE: Do not despair!

a) De Bruijn sequences have a Wikipedia page and therefore are natural.

b) We can post on ArXiv.

WE DID and a day later Markus Lohrey emailed us that, aside from the De Bruijn result, the results are already known using a different terminology, word chains.  See his survey here. Then the next day, Giovanni Pighizzini emails us that he had previously published lower bounds for De Bruijn sequences. We have since withdrawn the paper. We revised it by putting in references and history but will not put it on arxiv. The revised paper is here.

LANCE: Bill, are you bummed out? Why did we even write the paper anyway?

BILL: Not at all!  My original goal was pedagogical, and the paper we have can still be taught in automata theory next spring. PLUS, we got invited to submit to Advanced in AI and ML with a 10% discount on publication fees (see here.) Since we are used to getting 100% discount on publication fees we won't be submitting, but it was nice to be asked.

LANCE: Yeah, nice to be asked to be parted from my money. At least I learned about word chains.

1. ok, we gotta be careful about these publisher behind "Advanced in AI and ML" and the like. it's most certainly falling into the scam category if not Predatory publisher. Best one-liner so far from them, we are completing issue 5, and short of exactly one paper; your paper seems to be fitting the bill, please submit it, and you will be entitled a 50% discount. Good that you didn't submit even if they offered a 100% discount!

2. But my question is relevance of post. Is it related to some other event that triggered the post? Posting a result known under different terminology … also reminiscent to closed form solution post.

3. (Bill) The result we presented may be new and interesting to the reader, and may be taught in an ugrad autmata theory course. The fact that it is all already known is irrelevant to that point, though needs to be included for accuracy. As for points to be gotten from the history, take your pick (a) don't be shy about getting help on a problem, (b) to find out if something is already known, post to arxiv, (c) we should have asked around about the result before purusnig it,
(d) we are glad we didn't ask around about the result before pursuing it, (e) the notion of natural'' is ill-defined and people can differ on it, (f) I am sure there are others.

4. Natural means of it appears in nature by itself and not constructed by us artificially for a particular purpose.

What Natural numbers are really natural?

Are physical constants like fine-structure constant really natural?

Natural is probably not the word we want for what you want. You want something that _does not look artificial too your audience_, or _is familiar and well-known to your audience_.

5. No, he said and meant the right thing. He didn't say natural. He said "natural", which is to be taken "relativistically".