Archive for the ‘Automata Theory’ Category

Deterministic Finite Automata in Haskell

October 2, 2012

Consider the deterministic finite automaton [1] illustrated below

[ state diagram courtesy of Michael Sipser [1] ]

Henceforth, we will call this automaton M_1. Note that M_1 has three states, labeled q_1, q_2, and q_3. The start state is q_1 (note the arrow coming from nowhere) and the accept state is q_2 (note the double circle). There are two possible inputs, labeled 0 and 1, and, depending on the input, the automaton will jump from one state to another. In the diagram above these state transitions are depicted using arrows connecting two states.

Suppose that the automaton M_1 receives an input string such as 1101 (which the automaton reads from left to right). Since the start state is q_1 and the first input symbol is 1, the automaton will jump to state q_2. The second input symbol is 1, so the automaton will remain at q_2. The automaton reads the third input symbol, which is 0, and then jumps from state q_2 to state q_3. The last input symbol is 1 and thus the automaton jumps from state q_3 to state q_2. Hence, the state sequence corresponding to the input string 1101 is the following

q_1 \xrightarrow{1} q_2 \xrightarrow{1} q_2 \xrightarrow{0} q_3 \xrightarrow{1} q_2

After reading the last symbol in the input string, M_1 produces an output. Since the final state q_2 is the accept state, we have that the automaton produces the output “accept”. Were the final state not q_2, the automaton would produce the output “reject”. We conclude that this automaton accepts the input string 1101. What other input strings does M_1 accept? Michael Sipser answers this question in [1]:

Experimenting with this machine on a variety of input strings reveals that it accepts the strings 1, 01, 11, and 0101010101. In fact, M_1 accepts any string that ends with a 1, as it goes to its accept state q_2 whenever it reads the symbol 1. In addition, it accepts strings 100, 0100, 110000, and 0101000000, and any string that ends with an even number of 0s following the last 1. It rejects other strings, such as 0, 10, 101000.

A set of strings is called a language [1]. The set of all input strings accepted by the deterministic finite automaton M_1 is a language which we denote by L (M_1).

__________

Formal definition

A deterministic finite automaton (DFA) consists of a finite set of states Q, a finite input alphabet \Sigma that tells us what the allowed input symbols are, a transition function \delta : Q \times \Sigma \to Q that tells us how to jump from one state to another, a start state q_0 \in Q, and a set of accept states A \subseteq Q. A deterministic finite automaton (DFA) is thus a 5-tuple of the form

(Q, \Sigma, \delta, q_0, A)

The deterministic finite automaton M_1 which we discussed earlier in this post is defined formally as follows

M_1 := (\{q_1, q_2, q_3\}, \{0,1\}, \delta, q_1, \{q_2\})

where the transition function \delta: \{q_1, q_2, q_3\} \times \{0,1\} \to \{q_1, q_2, q_3\} is defined enumeratively as

\delta (q_1, 0) = q_1,        \delta (q_2, 0) = q_3,        \delta (q_3, 0) = q_2

\delta (q_1, 1) = q_2,        \delta (q_2, 1) = q_2,        \delta (q_3, 1) = q_2

Alternatively, we could view each state transition q_i \xrightarrow{\sigma} q_j as an ordered triple (q_i, \sigma, q_j), which might be easier to implement in some programming languages.

__________

Computing state sequences

Given an input string \sigma = \sigma_1 \sigma_2 \dots \sigma_n over a given alphabet \Sigma, how do we obtain the corresponding state sequence? I solved that problem last January. I will not repeat myself here, but the crux of the matter is that the final state can be obtained using a left-fold

\text{foldl} \, \delta \, q_0\, [\sigma_1, \sigma_2, \dots, \sigma_{n}]

whereas the entire state sequence can be computed using a left-scan

\text{scanl} \, \delta \, q_0\, [\sigma_1, \sigma_2, \dots, \sigma_{n}]

Lacking a better name, I called this procedure the “scanl trick“, as I used the Haskell function scanl to implement it. Please let me know if you find a more sophisticated name for this “trick”.

__________

Haskell implementation of the DFA

Without further ado, here is a Haskell script:

data State = Q1 | Q2 | Q3 deriving (Read, Show, Eq)

type Input = Integer

-- define state-transition function
delta :: State -> Input -> State
delta Q1 0 = Q1
delta Q1 1 = Q2
delta Q2 0 = Q3
delta Q2 1 = Q2
delta Q3 0 = Q2
delta Q3 1 = Q2
delta  _ _ = error "Invalid input!"

-- define initial state
initialstate :: State
initialstate = Q1

-- create list of accept states
acceptstates :: [State]
acceptstates = [Q2]

-- create infinite list of input sequences
inputss :: [[Input]]
inputss = concat $ iterate g [[]]
          where g = concat . map (\xs -> [ xs ++ [s] | s <- [0,1]])

-- create accept predicate
isAccepted :: [Input] -> Bool
isAccepted inputs = finalstate `elem` acceptstates
                    where finalstate = foldl delta initialstate inputs

-- compute language recognized by the DFA
language :: [[Input]]
language = filter isAccepted inputss

Some remarks about this script are in order. Please note that:

  • Sets are represented by lists. Strings are represented by lists, too. The latter is more natural than the former. Sets of strings become lists of lists.
  • A new data type is created to represent the states, which we denote by Q1, Q2, and Q3. The input symbols are integers.
  • Note that Input is a type synonym, inputs is a list of inputs symbols (i.e., an input string), and inputss is a list of lists of inputs (i.e., a list of input strings). Yes, it is a bit confusing.
  • The final state is computed using the higher-order function foldl (left-fold). We check if the final state is an accept state using  the list membership predicate elem.

If you have any objections to this script, please do let me know. Let us now test it! We load it into GHCi and voilà:

*Main> -- check list of input strings
*Main> take 31 $ inputss
[[],[0],[1],
 [0,0],[0,1],[1,0],[1,1],
 [0,0,0],[0,0,1],[0,1,0],[0,1,1],
 [1,0,0],[1,0,1],[1,1,0],[1,1,1],
 [0,0,0,0],[0,0,0,1],[0,0,1,0],[0,0,1,1],
 [0,1,0,0],[0,1,0,1],[0,1,1,0],[0,1,1,1],
 [1,0,0,0],[1,0,0,1],[1,0,1,0],[1,0,1,1],
 [1,1,0,0],[1,1,0,1],[1,1,1,0],[1,1,1,1]]
*Main> -- compute the language of the automaton
*Main> -- (let us extract some 40 input strings only)
*Main> take 40 language
[[1],[0,1],[1,1],
 [0,0,1],[0,1,1],[1,0,0],[1,0,1],[1,1,1],
 [0,0,0,1],[0,0,1,1],[0,1,0,0],[0,1,0,1],
 [0,1,1,1],[1,0,0,1],[1,0,1,1],[1,1,0,0],
 [1,1,0,1],[1,1,1,1],[0,0,0,0,1],[0,0,0,1,1],
 [0,0,1,0,0],[0,0,1,0,1],[0,0,1,1,1],[0,1,0,0,1],
 [0,1,0,1,1],[0,1,1,0,0],[0,1,1,0,1],[0,1,1,1,1],
 [1,0,0,0,0],[1,0,0,0,1],[1,0,0,1,1],[1,0,1,0,0],
 [1,0,1,0,1],[1,0,1,1,1],[1,1,0,0,1],[1,1,0,1,1],
 [1,1,1,0,0],[1,1,1,0,1],[1,1,1,1,1],[0,0,0,0,0,1]]

The list of input strings is exactly the same I posted yesterday. The input strings in the language L (M_1), or, to put it more precisely, the input strings in L (M_1) that are displayed above either end with a 1, or end “with an even number of 0s following the last 1“, as mentioned by Sipser in [1]. Why is that? The last 1 in the input string puts M_1 in state q_2, and an even number of 0s after that 1 lead to transitions from q_2 to q_3 and back to q_2, e.g.,

q_2 \xrightarrow{0} q_3 \xrightarrow{0} q_2

or

q_2 \xrightarrow{0} q_3 \xrightarrow{0} q_2 \xrightarrow{0} q_3 \xrightarrow{0} q_2

or

q_2 \xrightarrow{0} q_3 \xrightarrow{0} q_2 \xrightarrow{0} q_3 \xrightarrow{0} q_2 \xrightarrow{0} q_3 \xrightarrow{0} q_2 \xrightarrow{0} q_3 \xrightarrow{0} q_2

It would also be interesting to take a look at the state sequences corresponding to the input strings in L(M_1). We compute these state sequences using the infamous “scanl trick“:

*Main> -- create list of sequences of states
*Main> let statess = map (\xs -> scanl delta initialstate xs) language
*Main> -- take the "first" 20 state sequences
*Main> take 20 statess
[[Q1,Q2],[Q1,Q1,Q2],[Q1,Q2,Q2],
 [Q1,Q1,Q1,Q2],[Q1,Q1,Q2,Q2],[Q1,Q2,Q3,Q2],
 [Q1,Q2,Q3,Q2],[Q1,Q2,Q2,Q2],[Q1,Q1,Q1,Q1,Q2],
 [Q1,Q1,Q1,Q2,Q2],[Q1,Q1,Q2,Q3,Q2],[Q1,Q1,Q2,Q3,Q2],
 [Q1,Q1,Q2,Q2,Q2],[Q1,Q2,Q3,Q2,Q2],[Q1,Q2,Q3,Q2,Q2],
 [Q1,Q2,Q2,Q3,Q2],[Q1,Q2,Q2,Q3,Q2],[Q1,Q2,Q2,Q2,Q2],
 [Q1,Q1,Q1,Q1,Q1,Q2],[Q1,Q1,Q1,Q1,Q2,Q2]]

Note that all state trajectories end in state Q2, as we expected.

__________

References

[1] Michael Sipser, Introduction to the Theory of Computation (2nd edition), Thomson Course Technology, 2006.


Follow

Get every new post delivered to your Inbox.

Join 77 other followers