The classification of Large Language Models (LLMs) is not strictly formalized, but several common methods are used in the AI community:
LLMs are frequently ranked based on performance in benchmarks such as MMLU, HELM, or various standardized NLP and reasoning tasks.
-
Top-tier Models (strongest general-purpose performance):
- GPT-4 (OpenAI)
- Claude 3 Opus (Anthropic)
- PaLM 2 (Google, used in Gemini)
-
Mid-tier Models (high quality but not best-in-class across all tasks):
- GPT-3.5
- Claude 2
- Gemini Pro
- Earlier Claude models
-
Emerging or Specialized Models (fast-developing or domain-specific):
- Mistral AI's models (e.g., Mixtral)
- Meta's LLaMA 2
- Code-specific models like CodeLlama
LLMs are also categorized by the number of parameters they contain:
- Large: Billions to hundreds of billions (e.g., GPT-3 with 175B parameters)
- Medium: Tens of billions of parameters
- Small: Under 10 billion parameters; often optimized for edge devices or fast inference
Access and openness are additional axes of classification:
- Closed-source Models: Commercial, proprietary (e.g., GPT-4, Claude)
- Open-source Models: Freely released (e.g., LLaMA, BLOOM, Mistral)
- API-only Models: Accessible via APIs (e.g., GPT, Claude)
While informal, terms like "next-gen" or "current-gen" describe newer models with architectural and training improvements over their predecessors.
"Up-and-coming" is a fair term to describe models that are gaining traction due to novel techniques, strong community support, or rapid iteration:
- Anthropic’s Claude 3 series (significant performance leap)
- Open-source models from Mistral AI and Meta
- Domain-specialized models like DeepMind's AlphaCode (coding)
The landscape of LLMs is evolving rapidly. New models, updates, and benchmarks are continuously reshaping what qualifies as top-tier or emerging.
This gist was generated with the help of OpenAI based on information provided by the user.