Tuesday, February 12, 2013

Proving DFA-langs closed under concat and * without using equiv to NDFA's

While teaching the Equiv of DFA, NDFA, Reg exps, I wanted to impress upon my students that the best way to show DFA-langs are closed under concat and * is to first prove DFA=NDFA and then show NDFA's closed under concat and * (which is easy). The question arises: CAN one PROVE that DFA-langs are closed under concat and star without using the equivalence to NDFA's? I emailed this informal question to Richard Beigel (my go-to guy for formal lang theory--- its a good thing he's not my go-to guy for Prog Langs since then I couldn't use go-to's.)

He emailed back the following sketches of proofs of closure that only use DFA's:

Closure under concat: states will be of the form (q,S) where q is a state of M1 and S is a set of states of M2. Start in (q0, {}) where q0 is the start state of M1. Each time a character is read advance q in M1 and advance each element of S in M2. Whenever q is an accepting state, insert M2's start set into S. Accept if S contains an accepting state of M2.

Closure under star: It is easy to modify a DFA so that the start state has no incoming edges. I'll assume that M is already in that form. State of the new machine will be a set S of states of M. Start in state {q0}. Each time a character is read, advance each state in S and then if S contains an accepting state of M insert q0 into S. Accept if S contains q0.

After complementing him on his answes I asked him him about proving Reg-exp-langs closed under complementation without using equiv to DFA's. He doesn't know how to do that, so I assume it can't be done. But is there a rigorous way to even state that?


  1. Sounds like you need a programming language guy.

  2. I would argue that Richard's proofs are using equivalence with NFAs.

  3. In the context of interactive theorem provers, this is actually a serious
    question: since it is really hard to reason formally about automata.
    So my co-authors and I set out to prove the Myhill-Nerode theorem by only
    using regular expressions (never introducing the notion what an automaton
    is). Once you have the Myhill-Nerode theorem, closure under complementation
    is trivial. The short paper appeared at ITP 2012; the longer version is


    So the short answer is it can be done.

  4. See Moshe Vardi's lecture here:


    They had to deal with a similar issue.

  5. hi "UC". your paper is very interesting! was recently trying to convince stasys jukna to attack an important problem of his using an FSM minimization attack & suggested that automated theorem proving may someday crack problems somewhere in this area. your paper shows significant advancement in this area. now, can it be used to break some important open problems? the formalization of the myhill-nerode thm seems like a huge step in this direction.

    his problem is to prove that bellman ford algorithm for shortest path is optimal and interestingly it seems to result in something that looks a lot like finding optimal FSMs. here is the problem on stackexchange.

    Combinatorics of Bellman-Ford or how to make cyclic graphs acyclic?

    1. I think there is a slight confusion. Notice I have not written "automated theorem provers", but "interactive theorem provers". In interactive theorem proving we have the joke that thinking is only "Plan B" ("Plan A" being to first try quite powerful and integrated automated provers), but it is only a joke. The paper does not describe a push-button-result. On the contrary, it makes the case that even with such powerful tools a lot of (human) thinking is needed - in this case what is the right datastructure over which the proof should proceed. So I would not yet hold my breath that current technology in this area can on their own make any significant advances with open problems. But I am hoping too (and develop interactive theorem provers to be of more help in the future).

    2. hi UC ... yes of course did understand the paper is not presenting an automated proof. also the Jukna problem is far from being solved even with a powerful prover and even with lots of human hand holding. am just stating that its in the realm of possibility with a concerted attack. however, the myhill-nerode theorem is all about FSM optimization and leads to an actual algorithm given in Hopcroft & Ullman, intro to automata, languages, and computation. 1978. suspect some important proofs based on induction with FSM optimization are on the near horizon and that Jukna's problem is similar. and someday, maybe push button results will actually be possible. [thats whole other can of worms, I mean, another research program!]

  6. Richard Beigel's constructions have been known for a long time in the "state complexity" literature. The first appearance (with proof sketches only) seems to be in A. Maslov: Estimates of the number of states of finite automata, Soviet Mathematics Doklady, 11 (1970) 1373-1375.
    Later the results were rediscovered and elaborated by S. Yu, Q. Zhuang, K. Salomaa, The state complexity of some basic operations on regular languages, Theoret. Comput. Sci. 125 (1994) 315–328.

    As for the question about regular expressions, there are recent results of Gelade and Neven exhibiting a family of regular expressions for which the smallest regular expression for the complement has a doubly-exponential blowup in size. So any direct construction of the complement from the regular expression will have to be rather complicated. See http://tocl.acm.org/accepted/401neven.pdf .

  7. I agree with Jeff Erickson: those proofs seem like they just take the conversion from NFAs to DFAs and "inline" them into the proof that DFAs are closed under concat and *. I wonder if one can show that at least the resulting blow-up in size is necessary, as in Jeffrey Shallit's answer: is there a regular language L where the size of the smallest DFA for L* is exponential in the size of the smallest DFA for L? (And the analogous idea for concat.)

  8. Yes, Josh, this is in the paper of Yu et al. that I cited. They give exact bounds.

  9. The question `is there a proof of closure under CONCAT or * that only used DFA's' is of course an informal one. What I really wanted to know is `is there a proof I could show my class before they
    saw DFA's' and to that the proof Richard emailed me WOULD suffice.
    That is, it is simple enough to show them. (I did email them a link
    to this post, though that was after they had seen the normal proofs.)

    By contrast, from the comments here, it seems that comp of reg exp
    would not be in the class class-room-teachable.

  10. This brings up the question: Is it in the class LSSL (Limited Success Student Learnable)? Can we use it to separate LSSL from CRT?

  11. You can show pretty easily that regular languages are closed under complement via Brzozowski derivatives.That is, you can directly show that Da(not(r)) = not(Da(r)), and then you can show separately that the set of derivatives (modulo the ACU axioms for +) of a regular expression r is finite, which gives you a DFA for recognizing a regular language. You never even have to mention the existence of NFAs.

    See "Derivatives of Regular Expressions", Janusz A. Brzozowski, Journal of the ACM, Volume 11 Issue 4, Oct. 1964, Pages 481-494.

    Derivatives give a very pretty, algebraic way of proceeding, without forcing you into the weeds of automata constructions. It's a shame they are not better-known.

  12. I agree with the last Anonymous post on Brzozowski derivatives. On a personal note, though, I think that the set difference operator is probably easier for an undergraduate class to grok than the complement operator. Maybe it's my inner category theorist talking, but I don't think that set complement is in any way natural.

  13. ps re working with complements of DFAs, this question popped up on cs.se recently & the answers relate to complementing an NFA without determinizing it 1st. when thinking about it yrs ago, decided there seems to be a natural generalization of an NFA which halting is based on a multistate transition landing in "all of S1" states where S is a set, in contrast to the standard halt condition of landing in "any of S2" states, where S1 and S2 are complementary states on the NFA. but havent seen this considered in the literature. maybe someone else?

    Is there a efficient test for if an NFA accepts a subset of another NFA?

  14. ps re complementing the NFA without determinizing. on pondering this yrs ago, decided that there seems to be a natural generalization of NFAs as follows. instead of the halt state based on a multistate configuration landing in "any of S1" states where S1 is a set (as in determinization algorithm), consider a new concept of a multistate configuration landing in "all of S2" states where S2 is the complement of S1. but this does not seem to be considered in the literature. there is some discussion of this on a recent cs.se question:

    Is there a efficient test for if an NFA accepts a subset of another NFA?