shagunsodhani/A Persona-Based Neural Conversation Model.md

## A Persona-Based Neural Conversation Model.md

      
    Raw
  

              A Persona-Based Neural Conversation Model.md
            
          
    A Persona-Based Neural Conversation Model

Introduction


The paper presents persona-based models which add coherency to the response generated by sequence-to-senquence models like Neural Conversational Model.
Link to the paper

Persona


Defined as the character that an agent performs during conversational interactions.
Combination of identity, language, behaviour and interaction style.
May be adapted during the conversation itself.
Neural Conversational Model fails to maintain a consistent persona throughout the conversation resulting in incoherent responses.

Persona Based Models

Speaker Model


Models the personality of only the respondent.
Represents each speaker as a vector (embedding) which encodes speaker-specific information (eg age, gender, etc)
The model learns to cluster users along these traits using the data alone.
The vector v, corresponding to the given speaker, is used along with the context vector and the target representation generated in the previous timestamp to generate the current output representation.
v is learnt along with other model parameters.
Since the model learns the representation for each speaker, it can infer answers to certain questions about a given speaker even if the question has not been explicitly answered in the context of the given user (using the answers for similar users).

Speaker-Addressee Model


Models the personality of both the speaker and addressee.
Associate a representation V^{i, j} to capture the style of user i towards user j.
V_{i, j} = tanh(W₁ · vⁱ + W² · v²)
Use V_{i, j} as we used v in the speaker model.
Speaker-Addressee model can derive generalization capabilities from speaker embeddings.
For example, even if two speakers have never engaged in a conversation, the conversations between speakers similar to the two given speakers can be to capture the associated representation.

Decoding and Reranking


Generate N-best lists with beam size B = 200 and Max length = 20 (for generated candidates).
At each time step, examine all B × B possible next-word candidates, and add all hypothesis ending with an EOS token to the N-best list.
Rerank the generated N-best list using the scoring function from Li et al to avoid generic and commonplace responses.

Datasets


Twitter Persona Dataset

Dataset of tweet sequences having frequent (at least 60) engagements from Twitter FireHose.


Twitter Sordoni Dataset

Similar to Twitter Persona Dataset but with more references per message (up to 10).


Television Transcripts

Since this dataset alone was very small to train an open domain dialogue model, a standard SEQ2SEQ model is first trained using OpenSubtitles dataset and further tuned to the transcripts dataset.


Experiments


The proposed models yields performance improvements in both perplexity and BLEU scores over baseline SEQ2SEQ models.
Similar gains observed in speaker consistency as measured by human judges.

Open Questions


There is no evaluation of what does the speaker embeddings map to. The paper mentions that the embeddings should be able to capture the aspects like age, gender etc but these embeddings have not been explored in the paper.