Jon Awbrey
Registered: Feb 2004
Posts: 558 
Finding Out A Language's Automata
Hi Edward,
Grammar induction is always hard. In my own explorations
of the subject I eventually found it necessary to abandon
all cleverness and proceed empirically. In doing this we
assume that we have a grammarunknown source generating
a formal language L c !A!*, and the best that we can do
by way of capturing its regularities on the fly, that is,
after the fashion of a realtime data stream, is to form
the finite state transition tree for the language L_t
that approximates L up to the date t. One then has
a data basis for comparing language sources based
on their finite state approximations.
One component of a program that I wrote in the 80's
implements this strategy for 2level formal languages,
that is, formal languages that have a "lexical" level
for the "words" and a "literal" level for the "phrases".
I am somewhat tardily documenting this work,
for example, at the following list location:
http://stderr.org/pipermail/inquiry...thread.html#102
Have had it on my slate to try and see if this program
could be redone in Mathematica, but may not get around
to it any time soon. Always interested in talking to
anybody who wants to try it, though.
Jon Awbrey
Report this post to a moderator  IP: Logged
