_{σ}(w) is the number of σ's in w.

We often ask our students about languages like { w | n

_{a}(w) = 2n

_{b}(w) } (CFL but not REG). Lets formally define

*languages that are like that*.

A COUNTING DESCRIPTION is a boolean combination of linear equations and inequalities involving n

_{σ}(w). For example

(n

_{a}(w) ≤ 2n

_{b}(w)+3) AND NOT( n

_{c}(w) = n

_{b}(w) ).

We denote a counting description by E. Let L(E) = { w : E(w) is true } A lang L is CD if there exists an E such that L=L(E). What to make of the class CD?

The following items came out of emails between myself and Eric Allender, Dave Barrington, and Neil Immerman.

- CD is contained in 1-way log space and in uniform TC
_{0}.

- CD is incomparable to REG since (1) from above we see there are langs in CD that are not REG, and (2) ba
^{*}is not in CD.

- CD intersect REG is in uniform AC
_{0}

- Parikh's Theorem yields a large class of CD's that produce context free languages.

- If E is a Boolean combination of threshold and mod statement, each about a single n
_{σ}(w), then L(E) is regular

- The following papers may help answer some of the questions one could ask: here and here.

- Given a counting description E, is L(E) regular?

- Given a counting description E, is L(E) context free?

^{1/2}) since that's not natural. Maybe some sublinear class would be interesting. We can also ask variants of these questions involving any combinations of the following variants:

- Do not allow negation.

- Do not allow intersection.

- Do not allow inequalities.

- Do not allow additive constants.

- Do not allow multiplicative constants.

- Only allow a bounded alphabet size.

- Allow other types of equations, for example n_a(w) = n_b(w)
^{2}.

- Allow other primitives such ast n_a(w) == n_b(w) mod 9.

- Have a non-uniform version of Counting Descriptions and bound the size of the formula as a function of n.

In problem 7 if you allow any poly then the problem is undecidable by the solution to Hilbert's tenth problem.

A trivial remark after item 2.(2): Your languages are closed under permutation. The class CD may then be compared to other classes closed under permutation. I am not at all a specialist of these things but I suspect such classes have been studied before!

ReplyDeletesounds somewhat similar to timed automata. maybe some natural bridge thms possible?

ReplyDeleteThe concept of semilinearity is very relevant here -- so all MCFGs/LCFRSs are all semilinear too (permutation equivalent to a regular language.)

ReplyDeleteYou might also look at the use of string kernels to generalise beyond n_a(w) to n_{abc}(w) -- i.e. look st linear constraints on continuous or discontinuous substrings.

Look at "Parikh automata".

ReplyDelete