Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save dondon2475848/bb3e1e24b45191c5cd43cb691967c327 to your computer and use it in GitHub Desktop.
Save dondon2475848/bb3e1e24b45191c5cd43cb691967c327 to your computer and use it in GitHub Desktop.

Dataset

英文

  1. CNN/Daily Mail

    • 2015-Hermann et al. - Teaching machines to read and comprehend
    • 2016-Nallapati et al.-Abstractive text summarization using sequence-to-sequence rnns and beyond
    • Nallapati等人有定義評估的步驟,後續如要使用可以follow他們的研究
    • dataset contains 287,113 training examples, 13,368 validation examples and 11,490 testing examples. After limiting the input length to 800 tokens and output length to 100 tokens, the average input and output lengths are respectively 632 and 53 tokens.
  2. the New York Times dataset (NYT)

    • 2008 - Evan Sandhaus - The new york times annotated corpus.
    • 目前僅有Paulus等人在2017年將此資料集用於 abstractive summarization
  3. DUC-2004

    • 2003 - Dorr et al. - Hedge trimmer: A parse-and-trim approach to headline generation.
    • DUC-2004 task can only generate very short summaries up to 75 characters, and are usually used with one or two input sentences.

中文

  1. CIRB010-Chinese Information Retrieval Benchmark

evaluation

自動評價方法

  1. ROUGE

    • 被廣泛使用,為目前最主流的評價方法

    • ROUGE-n

    • ROUGE-L

    • ROUGE-SU

    • Chin-Yew Lin. Rouge: A package for automatic evaluation of summaries. Workshop on Text Summarization Branches Out, Post-Conference Workshop of ACL 2004.

    • sumeval

    • pyrouge

  2. Edmundson

  3. METEOR metric

  4. pyramid

  5. BE方法

  6. N-gram
    Chin-Yew Lin and Eduard Hovy. Automatic Evaluation of Summaries Using N-gram Co-Occurrence Statistics. In Proceedings of the Human Technology Conference 2003 (HLT-NAACL-2003).

  7. Automatic Evaluation of Machine Translation
    Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. BLEU: a Method for Automatic Evaluation of Machine Translation.

人工評價方法

參考文章:

  1. http://blog.csdn.net/lcj369387335/article/details/69845385
  2. https://github.com/mathsyouth/awesome-text-summarization
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment