Skip to content

Instantly share code, notes, and snippets.

@nudles
Last active June 4, 2020 07:47
Show Gist options
  • Save nudles/dac21e9c85291a4df85e292a5031ad0b to your computer and use it in GitHub Desktop.
Save nudles/dac21e9c85291a4df85e292a5031ad0b to your computer and use it in GitHub Desktop.
FAQ Bot

Updated at 3:46PM, June 4

We are going to implement this model using SINGA. https://github.com/jojonki/QA-LSTM Please ignore the text below currently.

Task

Create a question answering model for customer support.

The dataset

https://www.kaggle.com/thoughtvector/customer-support-on-twitter/data

Preprocessing the data to

  1. get a table with each row including both the query and the response tweet like https://www.kaggle.com/psbots/customer-support-meets-spacy-universe
  2. delete non-english tweets https://www.kaggle.com/psbots/customer-support-meets-spacy-universe; replace some words https://www.kaggle.com/sudalairajkumar/getting-started-with-text-preprocessing, etc.
  3. get a clean table for the dataset stored in pandas dataframe, including the query text, response text, company, time and one additional column for the emoji (if it has).

The Model

  1. Information retrieval based. Search similar queries from the tweets for the same company and return the response tweets back for matching. The matching is done by training a DL model: Bi-LSTM + average-pooling over time + Linear layer. Cosine similarity for the matching score. If we cannot find a python lib for the retrieval step, we can use elasticsearch. Reference: https://cloud.tencent.com/developer/article/1196826

  2. Generating response via seq2seq mode. Reference: https://www.kaggle.com/soaxelbrooke/twitter-basic-seq2seq.

1 is easier for the DL model development but it requires the retrieval step.

Platform

Developing on panda cluster; Tutorial on colab. Note that we may not be able to upload a large dataset to colab; in that case, we need to select a subset of the original dataset.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment