Computational Complexity: why are regular expressions defined the way they are

Monday, March 06, 2017

why are regular expressions defined the way they are

BILL: The best way to prove closure properties of regular languages is to first prove the equiv of DFA's, NDFA's and Reg Expressions. Then, if you want to prove a closure property, choose the definition of regular that makes it easiest. For example, to prove Reg Langs closed under intersection I would use DFA's, NOT Reg Expressions.

STUDENT: I thought reg expressions were

a) finite sets

b) if alpha and beta are reg exp then so are alpha UNION beta, alpha INTERSECT beta, alpha CONCAT beta and alpha*

BILL: No. Regular expressions are defined just using UNION, CONCAT, and *.

STUDENT: Why? Had the defined it my way then closure under INTERSECTION would be easier. For that matter toss in COMPLIMENTATION and you're get that easily also.

BILL: First off, thats not quite right. You compliment a DFA by saying how lovely its states are. I think you mean complement. Second off, GOOD question!- Why are Reg Expressions defined the way they are. I"ll try to look that up and if I can't find anything I'll blog about it.

STUDENT: When will you blog about it?

BILL: I just did. Now, let me ask the question more directly:

The definition of Reg Exp is essentially closure under UNION, CONCAT, STAR. Why not other things? There are very broadly three possibilities:

a) Historical Accident.

b) Some good math or CS reason for it.

c) Something else I haven't thought of.

I hope its (b). Moreover, I hope one of my readers knows and can enlighten me and the other readers.

11 comments:

KWillets1:02 PM, March 06, 2017
Offhand I'd say that UNION, CONCAT and * are O(1) operations while the others are not (on an NFA, which is pretty weak).
ReplyDelete
Replies
Luca Aceto2:51 PM, March 06, 2017
On page 73 of his classic RAND report (available at http://www.rand.org/pubs/research_memoranda/RM704.html), Kleene writes "We did not include the operations & (and) and overline (not) in our definition of regular events, because in the converse theorem (Theorem 6) we do not need to. In this section we will show that net constructions can be managed so that the two operations can be included." Right above Section 7.6 of that landmark paper, Kleene also says: "But, of course, the fact that our three operations are completely general (by Theorem 6) does not settle the question whether they will prove to be a convenient and practical way to deal with events. Possibly some other selection will prove to be more convenient. Or, we may add other operations and express these in turn in terms of our three."

So Kleene's choice of operations appears to be guided by an analysis of the paper by McCulloch and Pitts, and economy of description.

My two-cent worth.
ReplyDelete
Replies
Jakito5:08 PM, March 06, 2017
The answer by KWillets can be translated into an algebraic context by considering an arbitrary monoid M instead of only finitely generated free monoids. One then introduces the rational sets over M inductively starting from finite sets via UNION, CONCAT and *. Then one introduces an appropriate notion of nondeterministic M-automaton, and shows that a set is rational exactly if it is accepted by a finite nondeterministic M-automaton.

At least this question (http://cstheory.stackexchange.com/questions/31458/who-introduced-nondeterministic-m-automata-and-proved-that-the-finite-ones-reco) implicitly claims this, and J.-E. Pin confirms it with a reference to the supposedly original publication. If you care less about references to original publications, then these lecture notes (https://www7.in.tum.de/um/courses/auto/ws1314/script/autonotes.pdf) by Manfred Kufleitner provide easy to follow proofs with many images and applications like Angluin’s learning algorithm for recognizable sets (oh wait, recognizable sets are the other part of the story corresponding to deterministic automata).
ReplyDelete
Replies
Anonymous8:37 PM, March 06, 2017
An ahistorical reason: If your initial model were NFAs rather than DFAs then simulation of intersection of regular languages by NFAs would be just as problematic as simulation of Kleene star by DFAs. At least with the current definition, simulation by NFAs is trivial.
ReplyDelete
Replies
Anonymous1:09 AM, March 07, 2017
I would say: because the advantage you would have in proving some of the closure properties using regular expressions would be negated.
ReplyDelete
Replies
dom3:05 AM, March 07, 2017
In math every object is always defined with the least number of necessary axioms, and then the rest is shown to follow/be equivalent. This is math. Imagine if you defined planar graphs as graphs that can be drawn in the plane AND colored with four colors. It would certainly make the Four Color Theorem a lot easier to prove.
ReplyDelete
Replies
Pseudonym5:57 AM, March 07, 2017
Intersection and difference are O(1) operations if you use Brzozowski derivatives to construct your NFA, of course.

But this raises an interesting question: Consider some set of "interesting" operations that regular languages are closed under: KLEENE CLOSURE, CONCAT, UNION, INTERSECTION, SET DIFFERENCE, maybe INTERLEAVE PRODUCT... we know that the first three will imply the others. Is there some other set of three "interesting" operations that will do the job instead? Like, are the languages closed under *, concat, and intersection also closed under union?
ReplyDelete
Replies
Josh9:19 AM, March 08, 2017
I was hoping that Chomsky's use may have been the origin, but as far as I can tell this isn't the case. According the wikipedia, the paper in which Chomsky introduced his pseudonymous hierarchy of languages is https://chomsky.info/wp-content/uploads/195609-.pdf. But as one can see, he didn't use regular expressions at all, just finite state machines (as the deterministic case of a finite-state Markov process, which is a fairly ubiquitous concept in the sciences).
ReplyDelete
Replies

Add comment