Skip to content

Instantly share code, notes, and snippets.

@santanuchakrabarti
Last active December 5, 2021 21:12
Show Gist options
  • Star 3 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save santanuchakrabarti/7c4c74c7b25a9235fe42 to your computer and use it in GitHub Desktop.
Save santanuchakrabarti/7c4c74c7b25a9235fe42 to your computer and use it in GitHub Desktop.
An algorithm to generate a Regular Grammar from a Regular Expression

Some References to Algorithm Resources

My algorithm to convert a Regular Expression to a Regular Grammar is based on mostly two online resources.

  1. Left-Linear and Right-Linear Grammars
  2. Build regular grammar from regular expression

Other similar online resources on the same topic may be found at:

Algorithm to build Regular Grammar for a Regular Expression

Unlike all the algorithm descriptions referenced above this algorithm is readily convertible to a computer program by choosing appropriate data structures for the different constructs in the algorithm. I am not using any stricter mathematical formalism to describe the algorithm; only the constructs that help me explain the algorithm in a easy to understand manner are used.

  • TS is set of Terminal Symbols and s, s1, s', etc. represents terminal symbols.
  • NS is set of Non-terminal Symbols and S, S1, S', etc. represents non-terminal symbols.
  • NSt is run-time stack of non-terminal symbols.
  • S1 is the start (i.e. top level) non-terminal symbol.
  • CPr is collection of all Production Rules.
  • CPr(S) is an ordered set of all production rules for non-terminal S.
  • CPr(S)(n) is the recent production rule added for S, i.e. CPr(S)(n) belongs to CPr(S) where n = |CPr(S)|.
  • A regular expression is scanned from left to right consuming one input at a time. Each input is either a member of TS or a member of the set of operators {concat, |, *, +} or a member of {(, )}.
  • At the start current operator defaults to concat.
  • Before start scanning a regular expression S1 is pushed into NSt.
  • Loop through the following steps till end of regular expression.
  • If an operator is scanned set Current Operator.
    1. Let current operator is + and S is top of NSt.
    2. Let CPr(S)(n) = { S -> [String over TS and NS]Z }, where Z belongs either to TS or NS.
    3. Let S' be a new non-terminal, then replace Z by S', that is CPr(S)(n) = { S -> [String over TS and NS]S' } and CPr(S') = {1. S' -> Z, 2. S' -> ZS'}.
    4. Change current operator to concat. Start of loop.
    5. Let current operator is * and S is top of NSt.
    6. Let CPr(S)(n) = { S -> [String over TS and NS]Z }, where Z belongs either to TS or NS.
    7. Let S' be a new non-terminal, then replace Z by S', that is CPr(S)(n) = { S -> [String over TS and NS]S' } and CPr(S') = {1. S' -> null, 2. S' -> ZS'}.
    8. Change current operator to concat. Start of loop.
  • If a terminal symbol is scanned set Current Terminal.
    1. Let current terminal is s, current operator concat and S is top of NSt.
    2. If CPr(S) is null set then add { S -> s } to CPr(S) and so CPr(S)(n) = { S -> s }, n = 1.
    3. If CPr(S) is not null then concat s at the right most end of all the RHS of the productions.
    4. Start of loop.
    5. Let current terminal is s, current operator | and S is top of NSt.
    6. Add { S -> s } to CPr(S) and so CPr(S)(n) = { S -> s }, n = nprev + 1.
    7. Change current operator to concat. Start of loop.
  • If ( is scanned from the regular expression.
    1. Let current operator is concat and S is top of NSt.
    2. Let S' be a new non-terminal.
    3. If CPr(S) is null set then add { S -> S' } to CPr(S) and so CPr(S)(n) = { S -> S' }, n = 1.
    4. If CPr(S) is not null then concat S' at the right most end of all the RHS of the productions.
    5. Push S' to NSt. Start of loop.
    6. Let current operator is | and S is top of NSt.
    7. Let S' be a new non-terminal.
    8. Add { S -> S' } to CPr(S) and so CPr(S)(n) = { S -> S' }, n = nprev + 1.
    9. Push S' to NSt.
    10. Change current operator to concat. Start of loop.
  • if ) is scanned from the regular expression then pop S from the top of NSt. Start of loop.
  • Reduction/optimization of production rules.
    1. Order the production rules according to LHS from top-most to the lowest level. Start from the lowest level production.
    2. Let S -> [String over TS]S'[String over TS and NS] is the current reduction candidate where S' is the current unchecked non-terminal.
    3. If S' does not appear in any of the RHS of its production rules then for each { S' -> A } belonging to CPr(S'), replace S -> [String over TS]S'[String over TS and NS] by S -> [String over TS]A[String over TS and NS]. Apply this rule to each S' in production rules from lowest to top-most.
    4. Let S be the start symbol which has only one RHS of the form S -> S'[String over TS and NS], where S' is also a non-terminal.
    5. If S' has at least one RHS whose right-most symbol is not S' then set S' as start symbol S, and RHS(S) = RHS(S': S' is also the right-most symbol) union { w[String over TS and NS] : w belongs to RHS(S': S' is not right-most symbol)}.

Analysis of the algorithm

A formal time complexity and memory complexity analysis of the above algorithm is yet to be done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment