Skip to content

Instantly share code, notes, and snippets.

@muatik
Last active December 10, 2015 02:38
Show Gist options
  • Save muatik/4369581 to your computer and use it in GitHub Desktop.
Save muatik/4369581 to your computer and use it in GitHub Desktop.
These are the json formats for The Project Open Corpus.
{
"at": 1356360832,
"authors": [
"J. M. Barrie"
],
"name": "Peter Pan",
"text": "All children, except one, grow up. They soon know that they will grow up, and the way Wendy knew ...",
"raw": {} // bibliographic record
}
{
"at": 1356360832,
"source": "http://www.bbc.co.uk/turkce/haberler/2012/11/121107_obama_new_term.shtml",
"title": "Turkey grows fast.",
"text": "IMF has just reported that Turkey is the fastest count....",
"raw": {} // raw news source object without text
}
{
"at": 1356360832,
"userId": "muatik2",
"text": "Hello, this is a tweet.",
"raw": {} // raw twitter object
}
@muatik
Copy link
Author

muatik commented Dec 24, 2012

Every language will have its own .json file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment