Skip to content

Instantly share code, notes, and snippets.

@benarch
Last active July 10, 2023 09:00
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save benarch/8369be176c0b810ca02bc1c901208b92 to your computer and use it in GitHub Desktop.
Save benarch/8369be176c0b810ca02bc1c901208b92 to your computer and use it in GitHub Desktop.
OpenAI GPT versions gist
GPT Version Trained by Datasets & Data sources used Parameters count Release Date
GPT-1 4.5GB of data BookCorpus 4.5 GB of text from 7000 unpublished books 117 Million Parameters 2018
GPT-2 40 GB of data WebText 40 GB of text 8 million documents from 45 million webpages upvoted on Reddit 1.5 Billion Parameters 14.02.2019
GPT-3 570 GB of data 570GB plaintext; 0.4 trillion tokens. Mostly CommonCrawl WebText; English Wikipedia; and two books corpora (Books1 and Books2). 175 Billion Parameters 2020
GPT-3.5 45 TB of data Finetuned version of GPT3 175 Billion Parameters 15.03.2022
GPT-4 undisclosed trained data and parameters info are officialy undiscloosed yet but there are rumors that indicates those numbers 100 Trillion Parameters 14.03.2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment