santanuchakrabarti/RE_to_Grammar_Algorithm.md

## RE_to_Grammar_Algorithm.md

      
    Raw
  

              RE_to_Grammar_Algorithm.md
            
          
    Some References to Algorithm Resources

My algorithm to convert a Regular Expression to a Regular Grammar is based on mostly two online resources.

Left-Linear and Right-Linear Grammars
Build regular grammar from regular expression

Other similar online resources on the same topic may be found at:

Constructing an Equivalent Regular Grammar from a Regular Expression
Algorithm to convert regular expression to linear grammar

Algorithm to build Regular Grammar for a Regular Expression

Unlike all the algorithm descriptions referenced above this algorithm is readily convertible to a computer program by choosing appropriate data structures for the different constructs in the algorithm. I am not using any stricter mathematical formalism to describe the algorithm; only the constructs that help me explain the algorithm in a easy to understand manner are used.

TS is set of Terminal Symbols and s, s1, s', etc. represents terminal symbols.
NS is set of Non-terminal Symbols and S, S1, S', etc. represents non-terminal symbols.
NSt is run-time stack of non-terminal symbols.
S1 is the start (i.e. top level) non-terminal symbol.
CPr is collection of all Production Rules.
CPr(S) is an ordered set of all production rules for non-terminal S.
CPr(S)(n) is the recent production rule added for S, i.e. CPr(S)(n) belongs to CPr(S) where n = |CPr(S)|.
A regular expression is scanned from left to right consuming one input at a time. Each input is either a member of TS or a member of the set of operators {concat, |, *, +} or a member of {(, )}.
At the start current operator defaults to concat.
Before start scanning a regular expression S1 is pushed into NSt.
Loop through the following steps till end of regular expression.
If an operator is scanned set Current Operator.

Let current operator is + and S is top of NSt.
Let CPr(S)(n) = { S -> [String over TS and NS]Z }, where Z belongs either to TS or NS.
Let S' be a new non-terminal, then replace Z by S', that is CPr(S)(n) = { S -> [String over TS and NS]S' } and CPr(S') = {1. S' -> Z, 2. S' -> ZS'}.
Change current operator to concat. Start of loop.
Let current operator is * and S is top of NSt.
Let CPr(S)(n) = { S -> [String over TS and NS]Z }, where Z belongs either to TS or NS.
Let S' be a new non-terminal, then replace Z by S', that is CPr(S)(n) = { S -> [String over TS and NS]S' } and CPr(S') = {1. S' -> null, 2. S' -> ZS'}.
Change current operator to concat. Start of loop.


If a terminal symbol is scanned set Current Terminal.

Let current terminal is s, current operator concat and S is top of NSt.
If CPr(S) is null set then add { S -> s } to CPr(S) and so CPr(S)(n) = { S -> s }, n = 1.
If CPr(S) is not null then concat s at the right most end of all the RHS of the productions.
Start of loop.
Let current terminal is s, current operator | and S is top of NSt.
Add { S -> s } to CPr(S) and so CPr(S)(n) = { S -> s }, n = nprev + 1.
Change current operator to concat. Start of loop.


If ( is scanned from the regular expression.

Let current operator is concat and S is top of NSt.
Let S' be a new non-terminal.
If CPr(S) is null set then add { S -> S' } to CPr(S) and so CPr(S)(n) = { S -> S' }, n = 1.
If CPr(S) is not null then concat S' at the right most end of all the RHS of the productions.
Push S' to NSt. Start of loop.
Let current operator is | and S is top of NSt.
Let S' be a new non-terminal.
Add { S -> S' } to CPr(S) and so CPr(S)(n) = { S -> S' }, n = nprev + 1.
Push S' to NSt.
Change current operator to concat. Start of loop.


if ) is scanned from the regular expression then pop S from the top of NSt. Start of loop.
Reduction/optimization of production rules.

Order the production rules according to LHS from top-most to the lowest level. Start from the lowest level production.
Let S -> [String over TS]S'[String over TS and NS] is the current reduction candidate where S' is the current unchecked non-terminal.
If S' does not appear in any of the RHS of its production rules then for each { S' -> A } belonging to CPr(S'), replace S -> [String over TS]S'[String over TS and NS] by S -> [String over TS]A[String over TS and NS]. Apply this rule to each S' in production rules from lowest to top-most.
Let S be the start symbol which has only one RHS of the form S -> S'[String over TS and NS], where S' is also a non-terminal.
If S' has at least one RHS whose right-most symbol is not S' then set S' as start symbol S, and RHS(S) = RHS(S': S' is also the right-most symbol) union { w[String over TS and NS] : w belongs to RHS(S': S' is not right-most symbol)}.


Analysis of the algorithm

A formal time complexity and memory complexity analysis of the above algorithm is yet to be done.