This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
GPT Version | Trained by | Datasets & Data sources used | Parameters count | Release Date | |
---|---|---|---|---|---|
GPT-1 | 4.5GB of data | BookCorpus 4.5 GB of text from 7000 unpublished books | 117 Million Parameters | 2018 | |
GPT-2 | 40 GB of data | WebText 40 GB of text 8 million documents from 45 million webpages upvoted on Reddit | 1.5 Billion Parameters | 14.02.2019 | |
GPT-3 | 570 GB of data | 570GB plaintext; 0.4 trillion tokens. Mostly CommonCrawl WebText; English Wikipedia; and two books corpora (Books1 and Books2). | 175 Billion Parameters | 2020 | |
GPT-3.5 | 45 TB of data | Finetuned version of GPT3 | 175 Billion Parameters | 15.03.2022 | |
GPT-4 | undisclosed | trained data and parameters info are officialy undiscloosed yet but there are rumors that indicates those numbers | 100 Trillion Parameters | 14.03.2023 |