Model | Back-and-Forth Conversations | Tools (Functions) | Polyglotism | MMLU | ENEM | Streaming | Latency | Pricing |
---|---|---|---|---|---|---|---|---|
Cohere Command | 67.25% | 0.00% | 13.33% | 51.31% | 30.56% | 95.19% | 35.72% | 22.50% |
Cohere Command Light | 61.25% | 0.00% | 13.33% | 38.86% | 11.39% | 91.84% | 99.91% | 63.55% |
Google Gemini Pro | 39.50% | 65.00% | 100.00% | 63.98% | 58.33% | 11.53% | 50.31% | 25.30% |
Maritaca MariTalk | 80.25% | 0.00% | 66.67% | 60.45% | 56.39% | 8.34% | 21.48% | 57.45% |
Mistral Medium | 85.25% | 0.00% | 83.33% | 71.99% | 66.67% | 87.54% | 24.82% | 15.10% |
Mistral Small | 71.75% | 0.00% | 70.00% | 68.86% | 60.56% | 79.33% | 52.88% | 33.08% |
Mistral Tiny | 80.50% | 0.00% | 56.67% | 56.36% | 45.28% | 74.24% | 78.94% | 95.55% |
OpenAI GPT-3.5 Turbo | 86.75% | 82.00% | 100.00% | 63.75% | 64.44% | 74.37% | 74.87% | 37.19% |
OpenAI GPT-4 Turbo | 87.00% | 90.50% | 100.00% | 85.91% | 88.89% | 93.97% | 13.41% | 10.24% |
Extracted from LBPE Score Report 1.0.0