Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 4 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save shagunsodhani/8ad464e7d0ea4c7c6ed5189ac4e44095 to your computer and use it in GitHub Desktop.
Save shagunsodhani/8ad464e7d0ea4c7c6ed5189ac4e44095 to your computer and use it in GitHub Desktop.
Notes for "A Persona-Based Neural Conversation Model"

A Persona-Based Neural Conversation Model

Introduction

Persona

  • Defined as the character that an agent performs during conversational interactions.
  • Combination of identity, language, behaviour and interaction style.
  • May be adapted during the conversation itself.
  • Neural Conversational Model fails to maintain a consistent persona throughout the conversation resulting in incoherent responses.

Persona Based Models

Speaker Model

  • Models the personality of only the respondent.
  • Represents each speaker as a vector (embedding) which encodes speaker-specific information (eg age, gender, etc)
  • The model learns to cluster users along these traits using the data alone.
  • The vector v, corresponding to the given speaker, is used along with the context vector and the target representation generated in the previous timestamp to generate the current output representation.
  • v is learnt along with other model parameters.
  • Since the model learns the representation for each speaker, it can infer answers to certain questions about a given speaker even if the question has not been explicitly answered in the context of the given user (using the answers for similar users).

Speaker-Addressee Model

  • Models the personality of both the speaker and addressee.
  • Associate a representation Vi, j to capture the style of user i towards user j.
  • Vi, j = tanh(W1 · vi + W2 · v2)
  • Use Vi, j as we used v in the speaker model.
  • Speaker-Addressee model can derive generalization capabilities from speaker embeddings.
  • For example, even if two speakers have never engaged in a conversation, the conversations between speakers similar to the two given speakers can be to capture the associated representation.

Decoding and Reranking

  • Generate N-best lists with beam size B = 200 and Max length = 20 (for generated candidates).
  • At each time step, examine all B × B possible next-word candidates, and add all hypothesis ending with an EOS token to the N-best list.
  • Rerank the generated N-best list using the scoring function from Li et al to avoid generic and commonplace responses.

Datasets

  • Twitter Persona Dataset

    • Dataset of tweet sequences having frequent (at least 60) engagements from Twitter FireHose.
  • Twitter Sordoni Dataset

    • Similar to Twitter Persona Dataset but with more references per message (up to 10).
  • Television Transcripts

    • Since this dataset alone was very small to train an open domain dialogue model, a standard SEQ2SEQ model is first trained using OpenSubtitles dataset and further tuned to the transcripts dataset.

Experiments

  • The proposed models yields performance improvements in both perplexity and BLEU scores over baseline SEQ2SEQ models.
  • Similar gains observed in speaker consistency as measured by human judges.

Open Questions

  • There is no evaluation of what does the speaker embeddings map to. The paper mentions that the embeddings should be able to capture the aspects like age, gender etc but these embeddings have not been explored in the paper.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment