Llama 3 8B and 70B pretrained and instruction-tuned models available today. Based on benchmarks, 8B and 70B model is not quite GPT-4 class, but 400B+ (still in development) will reach GPT-4 level soon. Llama 3 sets a new standard for state-of-the art performance and efficiency for openly available LLMs.
Key highlights:
- 8k context length
- New capabilities: enhanced reasoning and coding
- Big change: new tokenizer that expands the vocab size to 128K (from 32K tokens in v2) for better multilingual performance
- Trained with 7x+ more data on 15 trillion tokens on two clusters with 24K GPUs