shagunsodhani/Neural Generation of Regular Expressions from Natural Language with Minimal Domain Knowledge.md

## Neural Generation of Regular Expressions from Natural Language with Minimal Domain Knowledge.md

      
    Raw
  

              Neural Generation of Regular Expressions from Natural Language with Minimal Domain Knowledge.md
            
          
    Neural Generation of Regular Expressions from Natural Language with Minimal Domain Knowledge

Introduction


Task of translating natural language queries into regular expressions without using domain specific knowledge.
Proposes a methodology for collecting a large corpus of regular expressions to natural language pairs.
Reports performance gain of 19.6% over state-of-the-art models.
Link to the paper

Architecture


LSTM based sequence to sequence neural network (with attention)
Six layers

One-word embedding layer
Two encoder layers
Two decoder layers
One dense output layer.


Attention over encoder layer.
Dropout with the probability of 0.25.
20 epochs, minibatch size of 32 and learning rate of 1 (with decay rate of 0.5)

Dataset Generation


Created a public dataset - NL-RX - with 10K pair of (regular expression, natural language)
Two step generate-and-paraphrase approach

Generate step

Use handcrafted grammar to translate regular expressions to natural language.


Paraphrase step

Crowdsourcing the task of translating the rigid descriptions into more natural expressions.


Results


Evaluation Metric

Functional equality check (called DFA-Equal) as same regular expression could be written in many ways.


Proposed architecture outperforms both the baselines - Nearest Neighbor classifier using Bag of Words (BoWNN) and Semantic-Unify