Skip to content

Instantly share code, notes, and snippets.

View cedrickchee's full-sized avatar
⚒️
⚡ 🦀 🐿️ 🐘 🐳 ⬡ ⚛️ 🚢 🚀 🦄 🍵

Cedric Chee cedrickchee

⚒️
⚡ 🦀 🐿️ 🐘 🐳 ⬡ ⚛️ 🚢 🚀 🦄 🍵
View GitHub Profile
@cedrickchee
cedrickchee / llama-home.md
Created July 12, 2023 05:22 — forked from rain-1/llama-home.md
How to run Llama 13B with a 6GB graphics card
View llama-home.md

This worked on 14/May/23. The instructions will probably require updating in the future.

llama is a text prediction model similar to GPT-2, and the version of GPT-3 that has not been fine tuned yet. It is also possible to run fine tuned versions (like alpaca or vicuna with this. I think. Those versions are more focused on answering questions)

Note: I have been told that this does not support multiple GPUs. It can only use a single GPU.

It is possible to run LLama 13B with a 6GB graphics card now! (e.g. a RTX 2060). Thanks to the amazing work involved in llama.cpp. The latest change is CUDA/cuBLAS which allows you pick an arbitrary number of the transformer layers to be run on the GPU. This is perfect for low VRAM.

  • Clone llama.cpp from git, I am on commit 08737ef720f0510c7ec2aa84d7f70c691073c35d.
@cedrickchee
cedrickchee / ai-plugin.json
Created March 25, 2023 14:22 — forked from danielgross/ai-plugin.json
ChatGPT Plugin for Twilio
View ai-plugin.json
{
"schema_version": "v1",
"name_for_model": "twilio",
"name_for_human": "Twilio Plugin",
"description_for_model": "Plugin for integrating the Twilio API to send SMS messages and make phone calls. Use it whenever a user wants to send a text message or make a call using their Twilio account.",
"description_for_human": "Send text messages and make phone calls with Twilio.",
"auth": {
"type": "user_http",
"authorization_type": "basic"
},
@cedrickchee
cedrickchee / example.sh
Created March 6, 2023 13:28 — forked from shawwn/example.sh
How I run 65B using my fork of llama at https://github.com/shawwn/llama
View example.sh
mp=1; size=7B; # to run 7B
mp=8; size=65B; # to run 65B
for seed in $(randint 1000000)
do
export TARGET_FOLDER=~/ml/data/llama/LLaMA
time python3 -m torch.distributed.run --nproc_per_node $mp example.py --ckpt_dir $TARGET_FOLDER/$size --tokenizer_path $TARGET_FOLDER/tokenizer.model --seed $seed --max_seq_len 2048 --max_gen_len 2048 --count 0 | tee -a ${size}_startrek.txt
done
@cedrickchee
cedrickchee / LLMs.md
Last active January 24, 2024 06:16 — forked from yoavg/LLMs.md
Fix typos and grammar of the original writing.
View LLMs.md

Some remarks on Large Language Models

Yoav Goldberg, January 2023

Audience: I assume you heard of ChatGPT, maybe played with it a little, and was impressed by it (or tried very hard not to be). And that you also heard that it is "a large language model". And maybe that it "solved natural language understanding". Here is a short personal perspective of my thoughts of this (and similar) models, and where we stand with respect to language understanding.

Intro

Around 2014-2017, right within the rise of neural-network based methods for NLP, I was giving a semi-academic-semi-popsci lecture, revolving around the story that achieving perfect language modeling is equivalent to being as intelligent as a human. Somewhere around the same time I was also asked in an academic panel "what would you do if you were given infinite compute and no need to worry about labor costs" to which I cockily responded "I would train a really huge language model, just to show that it doesn't solve everything!". We

@cedrickchee
cedrickchee / chatgpt.md
Last active January 11, 2023 10:53 — forked from veekaybee/chatgpt.md
Everything I understand about chatgpt
View chatgpt.md

ChatGPT Resources

Context

ChatGPT appeared like an explosion on all my social media timelines in early December 2022. While I keep up with machine learning as an industry, I wasn't focused so much on this particular corner, and all the screenshots seemed like they came out of nowehre. What was this model? How did the chat prompting work? What was the context of OpenAI doing this work and collecting my prompts for training data?

I decided to do a quick investigation. Here's all the information I've found so far. I'm aggregating and synthesizing it as I go, so it's currently changing pretty frequently.

Model Architecture

@cedrickchee
cedrickchee / ffmpeg_mkv_mp4_conversion.md
Created November 28, 2022 09:14 — forked from jamesmacwhite/ffmpeg_mkv_mp4_conversion.md
Easy way to convert MKV to MP4 with ffmpeg
View ffmpeg_mkv_mp4_conversion.md

Converting mkv to mp4 with ffmpeg

Essentially just copy the existing video and audio stream as is into a new container, no funny business!

The easiest way to "convert" MKV to MP4, is to copy the existing video and audio streams and place them into a new container. This avoids any encoding task and hence no quality will be lost, it is also a fairly quick process and requires very little CPU power. The main factor is disk read/write speed.

With ffmpeg this can be achieved with -c copy. Older examples may use -vcodec copy -acodec copy which does the same thing.

These examples assume ffmpeg is in your PATH. If not just substitute with the full path to your ffmpeg binary.

Single file conversion example

@cedrickchee
cedrickchee / not_too_clever.md
Created September 1, 2022 16:47 — forked from raphlinus/not_too_clever.md
Translation of grugbrain.dev into English
View not_too_clever.md

The not-too-clever programmer

This is a translation of grugbrain.dev into clear English. All props to the original author.

Introduction

This is a collection of thoughts on software development, originally written by an pseudonymous author styling themselves the "grug brain developer," but then translated into clear English by Raph Levien.

I am not an extremely smart developer, but I have many years of experience and have learned some things, although still don't know everything.

@cedrickchee
cedrickchee / latency.markdown
Created December 3, 2021 16:14 — forked from hellerbarde/latency.markdown
Latency numbers every programmer should know
View latency.markdown

Latency numbers every programmer should know

L1 cache reference ......................... 0.5 ns
Branch mispredict ............................ 5 ns
L2 cache reference ........................... 7 ns
Mutex lock/unlock ........................... 25 ns
Main memory reference ...................... 100 ns             
Compress 1K bytes with Zippy ............. 3,000 ns  =   3 µs
Send 2K bytes over 1 Gbps network ....... 20,000 ns  =  20 µs
SSD random read ........................ 150,000 ns  = 150 µs

Read 1 MB sequentially from memory ..... 250,000 ns = 250 µs

View etcd_vs_kv_stores_notes.md

etcd Versus Other Key-value Stores

  • What do Etcd, Consul, and Zookeeper do?
    • Service Registration:
      • Host, port number, and sometimes authentication credentials, protocols, versions numbers, and/or environment details.
    • Service Discovery:
      • Ability for client application to query the central registry to learn of service location.
    • Consistent and durable general-purpose K/V store across distributed system.
  • Some solutions support this better than others.
@cedrickchee
cedrickchee / System Design.md
Created May 20, 2021 14:38 — forked from vasanthk/System Design.md
System Design Cheatsheet
View System Design.md

System Design Cheatsheet

Picking the right architecture = Picking the right battles + Managing trade-offs

Basic Steps

  1. Clarify and agree on the scope of the system
  • User cases (description of sequences of events that, taken together, lead to a system doing something useful)
    • Who is going to use it?
    • How are they going to use it?