Goals: Add links that are reasonable and good explanations of how stuff works. No hype and no vendor content if possible. Practical first-hand accounts of models in prod eagerly sought.
![Screenshot 2023-12-18 at 10 40 27 PM](https://private-user-images.githubusercontent.com/3837836/291468646-4c30ad72-76ee-4939-a5fb-16b570d38cf2.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjIwNzczMDMsIm5iZiI6MTcyMjA3NzAwMywicGF0aCI6Ii8zODM3ODM2LzI5MTQ2ODY0Ni00YzMwYWQ3Mi03NmVlLTQ5MzktYTVmYi0xNmI1NzBkMzhjZjIucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MDcyNyUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDA3MjdUMTA0MzIzWiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9NDhhNDIzZGEzM2I0MzVjYjcyODBlYTMxYjIyZDc5ZTM5NmJkM2ZhOGFhZTEwMTMwMjIwZTE2YTA1NTU3ZTJiYSZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QmYWN0b3JfaWQ9MCZrZXlfaWQ9MCZyZXBvX2lkPTAifQ.Z5jO3Vr71lobPiYTpD60jR_qBhEUFkH9O7YiIKM6jnw)
# Clone llama.cpp | |
git clone https://github.com/ggerganov/llama.cpp.git | |
cd llama.cpp | |
# Build it | |
make clean | |
LLAMA_METAL=1 make | |
# Download model | |
export MODEL=llama-2-13b-chat.ggmlv3.q4_0.bin |
FROM hayd/alpine-deno:1.10.1 | |
WORKDIR /src/app | |
ADD deps.ts ./ | |
RUN ["deno", "cache", "deps.ts"] | |
ADD *.ts ./ | |
RUN ["deno", "cache", "mod.ts"] | |
ENTRYPOINT ["deno", "run", "--unstable", "--allow-net", "--allow-hrtime", "--allow-env", "--cached-only", "--no-check", "mod.ts"] |
I bought M1 MacBook Air. It is the fastest computer I have, and I have been a GNOME/GNU/Linux user for long time. It is obvious conclusion that I need practical Linux desktop environment on Apple Silicon.
Fortunately, Linux already works on Apple Silicon/M1. But how practical is it?
package main | |
import ( | |
"context" | |
"flag" | |
"fmt" | |
"log" | |
"net/http" | |
"os" | |
"os/signal" |
Just documenting docs, articles, and discussion related to gRPC and load balancing.
https://github.com/grpc/grpc/blob/master/doc/load-balancing.md
Seems gRPC prefers thin client-side load balancing where a client gets a list of connected clients and a load balancing policy from a "load balancer" and then performs client-side load balancing based on the information. However, this could be useful for traditional load banaling approaches in clound deployments.
https://groups.google.com/forum/#!topic/grpc-io/8s7UHY_Q1po
gRPC "works" in AWS. That is, you can run gRPC services on EC2 nodes and have them connect to other nodes, and everything is fine. If you are using AWS for easy access to hardware then all is fine. What doesn't work is ELB (aka CLB), and ALBs. Neither of these support HTTP/2 (h2c) in a way that gRPC needs.