Steffen Röcker sroecker

## MLX LM LoRA Fine Tune.ipynb

      
              1 file
            
          
              3 forks
            
          
              0 comments
            
          
              19 stars
            
          
                awni
                / MLX LM LoRA Fine Tune.ipynb
            
            
              Created
              June 2, 2024 16:19
            
          
        Loading

      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
      
    
## streamlit_app.py
import streamlit as st
import concurrent.futures # We'll do computations in separate processes!
import mymodule # This is where you'll do the computation

# Your st calls must go inside this IF block.
if __name__ == '__main__':
    st.write("Starting a long computation on another process")

    # Pick max number of concurrent processes. Depends on how heavy your computation is, and how
    # powerful your machine is.

## README.md

      
              1 file
            
          
              8 forks
            
          
              13 comments
            
          
              86 stars
            
          
                Artefact2
                / README.md
            
            
              Last active
              June 25, 2024 19:00
            
              
                GGUF quantizations overview
              
          
    Which GGUF is right for me? (Opinionated)

Good question! I am collecting human data on how quantization affects outputs. See here for more information: ggerganov/llama.cpp#5962
In the meantime, use the largest that fully fits in your GPU. If you can comfortably fit Q4_K_S, try using a model with more parameters.
llama.cpp feature matrix

See the wiki upstream: https://github.com/ggerganov/llama.cpp/wiki/Feature-matrix

  
## llm_samplers_explained.md

      
              1 file
            
          
              2 forks
            
          
              0 comments
            
          
              13 stars
            
          
                kalomaze
                / llm_samplers_explained.md
            
            
              Last active
              June 16, 2024 18:19
            
              
                LLM Samplers Explained
              
          
    LLM Samplers Explained

Everytime a large language model makes predictions, all of the thousands of tokens in the vocabulary are assigned some degree of probability, from almost 0%, to almost 100%. There are different ways you can decide to choose from those predictions.
This process is known as "sampling", and there are various strategies you can use which I will cover here.
OpenAI Samplers

Temperature


Temperature is a way to control the overall confidence of the model's scores (the logits). What this means is that, if you use a lower value than 1.0, the relative distance between the tokens will become larger (more deterministic), and if you use a larger value than 1.0, the relative distance between the tokens becomes smaller (less deterministic).
1.0 Temperature is the original distribution that the model was trained to optimize for, since the scores remain the same.
Graph demonstration with voiceover: https://files.catbox.moe/6ht56x.mp4


## ollama_dspy.py
# install DSPy: pip install dspy
import dspy

# Ollam is now compatible with OpenAI APIs
#
# To get this to work you must include `model_type='chat'` in the `dspy.OpenAI` call.
# If you do not include this you will get an error.
#
# I have also found that `stop='\n\n'` is required to get the model to stop generating text after the ansewr is complete.
# At least with mistral.

## main.py
from typing import List, Optional, TypedDict
import modal
from modal import gpu, build, enter, exit, method


class Document(TypedDict):
    content: str
    metadata: dict


## coco_30k_hf_datasets.py
from datasets import Dataset, Features
from datasets import Image as ImageFeature
from datasets import Value
import pandas as pd
import os

# CSV comes from the notebook above.
df = pd.read_csv("coco_30k_randomly_sampled_2014_val.csv")
root_path = "val2014"

## rag-reranking-gpt-colbert-mistral.ipynb

      
              1 file
            
          
              1 fork
            
          
              0 comments
            
          
              15 stars
            
          
                virattt
                / rag-reranking-gpt-colbert-mistral.ipynb
            
            
              Last active
              April 3, 2024 04:23
            
              
                rag-reranking-gpt-colbert-mistral.ipynb
              
          
        Loading

      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
      
    
## p.sh
#!/usr/bin/env bash
function p() {
	jq -n \
		--arg content "$*" \
		'{
      "model": "pplx-7b-online",
      "messages": [
        {
          "role": "system",
          "content": "Be precise and concise."

## normcore-llm.md

      
              1 file
            
          
              216 forks
            
          
              38 comments
            
          
              2765 stars
            
          
                veekaybee
                / normcore-llm.md
            
            
              Last active
              June 29, 2024 03:29
            
              
                Normcore LLM Reads
              
          
    Anti-hype LLM reading list

Goals: Add links that are reasonable and good explanations of how stuff works. No hype and no vendor content if possible. Practical first-hand accounts of models in prod eagerly sought.
Foundational Concepts


Pre-Transformer Models
	import streamlit as st
	import concurrent.futures # We'll do computations in separate processes!
	import mymodule # This is where you'll do the computation

	# Your st calls must go inside this IF block.
	if __name__ == '__main__':
	st.write("Starting a long computation on another process")

	# Pick max number of concurrent processes. Depends on how heavy your computation is, and how
	# powerful your machine is.
	# install DSPy: pip install dspy
	import dspy

	# Ollam is now compatible with OpenAI APIs
	#
	# To get this to work you must include `model_type='chat'` in the `dspy.OpenAI` call.
	# If you do not include this you will get an error.
	#
	# I have also found that `stop='\n\n'` is required to get the model to stop generating text after the ansewr is complete.
	# At least with mistral.
	from typing import List, Optional, TypedDict
	import modal
	from modal import gpu, build, enter, exit, method


	class Document(TypedDict):
	content: str
	metadata: dict
	from datasets import Dataset, Features
	from datasets import Image as ImageFeature
	from datasets import Value
	import pandas as pd
	import os

	# CSV comes from the notebook above.
	df = pd.read_csv("coco_30k_randomly_sampled_2014_val.csv")
	root_path = "val2014"
	#!/usr/bin/env bash
	function p() {
	jq -n \
	--arg content "$*" \
	'{
	"model": "pplx-7b-online",
	"messages": [
	{
	"role": "system",
	"content": "Be precise and concise."