shagunsodhani/SmartReply.md

## SmartReply.md

      
    Raw
  

              SmartReply.md
            
          
    Smart Reply: Automated Response Suggestion for Email

Introduction


Proposes a novel, end-to-end architecture for generating short email responses.
Single most important benchmark of its success is that it is deployed in Inbox by Gmail and assists with around 10% of all mobile responses.
Link to the paper.

Challenges in deploying Smart Reply in a user-facing product


Responses must always be of high quality. Ensured by constructing a target response set to select responses from.
The likelihood of choosing the responses must be maximised. Ensured by normalising the responses and enforcing diversity.
The system should not add latency to emails. Ensured by using a triggering model to decide if the email is suitable to undergo the response generation pipeline. Computation time is further reduced by finding approximate best result instead of the best result.
Ensure privacy by encrypting all the data which adds challenge in verifying the model's quality and debugging the system.

Architecture

Preprocess Email


Perform actions like language detection, tokenization, sentence segmentation etc on the input email.

Triggering Model


A feed-forward neural network (with embedding layer and 3 fully connected hidden layers) to decide if the input email is suitable for suggesting responses.

Data


Training set of pairs (o, y) where o is the incoming message and y is a boolean variable to indicate if the message had a response.

Features


Unigrams, bigrams from the messages.
Signals like - is the recipient in the contact list of the sender.

Response Selection


LSTM network to predict the approximate best response for an incoming message o

Network


Sequence to Sequence Learning.
Reads the input message (token by token) and encode a vector representation.
Compute softmax to get the probability of first output token given the input token sequence.
Keep feeding in the previous response tokens and the input token sequence to compute the probability of next output token.
During inference, approximate the most likely response greedily by taking the most likely response at each timestamp and feeding it back or by using the beam search approach.

Response Set Generation


Generate a set of high-quality responses that also capture the variability in the intent of the response.
Canonicalize the email response by extracting the semantic structure using a dependency parser.
Partition all response messages into "semantic" clusters.
These semantic clusters define the response space for scoring and selecting possible responses and for promoting diversity among the responses.

Semantic Intent Clustering


Since a large, labelled dataset is not available, a graph based, semi-supervised approach is used.

Graph Construction


Manually define a few clusters with a small number of example responses for each cluster.
Construct a graph with frequent response messages (including the labelled nodes) as response nodes (V_R).
For each response node, extract a set of feature nodes (V_F) corresponding to features like skip-gram and n-grams and add an edge between the response node and the feature node.
Learn a semantic labelling for all response nodes by propagating semantic intent information (available because of labelled nodes) throughout the graph.
After some iterations, sample some of the unlabeled nodes from the graph, manually label these sample nodes and repeat this algorithm until convergence.
For validation, extract the top k members of each cluster and validate the quality with help of human evaluators.

Suggestion Diversity


Provide users with a varied set of response by omitting redundant response (by not selecting more than one response from any semantic cluster) and by enforcing negative (or positive) responses.
If the top two responses contain at least one positive (negative) response and none of the top three responses is
negative (positive), the third response is replaced with a negative (positive) one.
This is done by performing a second LSTM pass where the search is restricted to only positive (or negative) responses in the target set.

Strengths


The system is already in production and assists with around 10% of all mobile responses.