Skip to content

Instantly share code, notes, and snippets.

@hagope
Created July 27, 2023 18:25
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save hagope/ef3fb254ba2f89b79524a0eccefcd044 to your computer and use it in GitHub Desktop.
Save hagope/ef3fb254ba2f89b79524a0eccefcd044 to your computer and use it in GitHub Desktop.

Today on Hacker News, the top article was LLaMA2 Chat 70B outperformed ChatGPT linking to a leaderboard of LLMs. As of today, July 27, 2023, the top 10 is as follows:

Model Name Win Rate Length
GPT-4 95.28% 1365
LLaMA2 Chat 70B 92.66% 1790
Claude 2 91.36% 1069
ChatGPT 89.37% 827
WizardLM 13B V1.2 89.17% 1635
Vicuna 33B v1.3 88.99% 1479
Claude 88.39% 1082
OpenChat V2-W 13B 87.13% 1566
WizardLM 13B V1.1 86.32% 1525
OpenChat V2 13B 84.97% 1564
Vicuna 13B v1.3 82.11% 1132
LLaMA2 Chat 13B 81.09% 1513

Incidentally I have been playing around with inference on my own Mac and randomly trying different models, the leaderboard has been a good place to focus my experimentation.

Running LLM on MacBook Pro M1 Max

I'm currently running a MacBook Pro with an M1 Max chip (64 gB RAM). Much to my surprise, I can infer most open source LLMs using llama.cpp. Here's the setup:

Here are a list of the models I'm experimenting with:

Note: I'm using the q4_K_M.bin version of these models, model cards on HF and llama.cpp have a more detailed discussion on different quantization values.

Typically for 13B models I'm achieving 5-7 tokens/sec whereas with larger models I'm getting 1-2 tokens/sec. I have yet to dive deeper into parameter tuning for performance.

As for the results, my main use case is for data engineering tasks such as parsing sql, reformatting code, converting unstructured data to structured formats, and so on. So far, I've been pleasantly surprised with the results when compared to OpenAI models. I will continue to experiment with these models and find new tasks for these free and open AIs!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment