Skip to content

Instantly share code, notes, and snippets.

@xinzhel
Last active March 26, 2023 05:39
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save xinzhel/28f3fb5fba028730f4948205dc04ec06 to your computer and use it in GitHub Desktop.
Save xinzhel/28f3fb5fba028730f4948205dc04ec06 to your computer and use it in GitHub Desktop.

Arxiv and PubMed

https://github.com/armancohan/long-summarization

CNN Daily Mail

$wget https://storage.googleapis.com/allennlp-public-data/cnndm-combined-data-2020.07.13.tar.gz 
$tar -xzf cnndm-combined-data-2020.07.13.tar.gz
$mv cnndm-combined-data-2020.07.13 cnn_dm

SQuAD

https://rajpurkar.github.io/SQuAD-explorer/

SST-2

train, dev, test data

AG-News

 + download from https://www.kaggle.com/amananandrai/ag-news-classification-dataset

Reuters

 + download from https://www.kaggle.com/nltkdata/reuters  + unzip files  + run the script to prepare json files for training and test    $wget https://gist.githubusercontent.com/xinzhel/1bdd7b3f94539f83ce0d7beed320020a/raw/f7fd42bd643be75d48ba325629b8e86f11fca68c/reuters-json.py    $python reuters-json.py  

Gigaword

   $ wget https://data.deepai.org/gigaword.zip  $ unzip gigaword.zip  $ mv sumdata gigaword  

Twitter Gender Prediction

 + download from https://www.kaggle.com/crowdflower/twitter-user-gender-classification

DBpedia

$kaggle datasets download -d danofer/dbpedia-classes
$mkdir dbpedia_csv
$mv dbpedia-classes.zip dbpedia_csv/
$cd dbpedia_csv/
$unzip dbpedia-classes.zip
$mv DBPEDIA_test.csv test.csv
$mv DBPEDIA_train.csv train.csv
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment