Skip to content

Instantly share code, notes, and snippets.

View ivanleomk's full-sized avatar

Ivan Leo ivanleomk

View GitHub Profile
@ivanleomk
ivanleomk / server.py
Created May 13, 2025 08:48
VLLM Deployment
import modal
from pathlib import Path
# Variables
MODEL_DIR = Path("/models")
MODEL_NAME = "Qwen/Qwen2.5-7B-Instruct"
MODEL_REVISION = "a09a35458c702b33eeacc393d103063234e8bc28"
N_GPU = 1
MINUTES = 60
VLLM_PORT = 8000
@ivanleomk
ivanleomk / hackernews.py
Created March 28, 2025 13:37
This shows an instance where we get a context overflow error
import asyncio
import os
import shutil
from agents import Agent, Runner, gen_trace_id, trace
from agents.mcp import MCPServer, MCPServerStdio
async def run(mcp_server: MCPServer):
agent = Agent(
@ivanleomk
ivanleomk / inference_instructor.py
Created March 10, 2025 10:46
This is how we can parse the raw chat completion with instructor. I used the 12b Nemo model instead in this case and found it to work much better for function calling. ( with the chat completion )
from openai import OpenAI
from pydantic import BaseModel
import os
import instructor
client = OpenAI(
base_url="https://api.inference.net/v1",
api_key=os.getenv("INFERENCE_API_KEY"),
)
@ivanleomk
ivanleomk / inference.py
Created March 10, 2025 10:44
This is the code that I'm using to test out the inference models
from openai import OpenAI
import os
parameters = {
"messages": [
{
"role": "system",
"content": "You are a helpful assistant that can answer questions and uses tools to get information.",
},
{"role": "user", "content": "What's the weather like in Paris today?"},
from pydantic import BaseModel
import instructor
from google.ai.generativelanguage_v1beta.types.file import File
import time
import google.generativeai as genai
client = instructor.from_gemini(
client=genai.GenerativeModel(
model_name="models/gemini-2.0-flash-exp",
@ivanleomk
ivanleomk / queries.yml
Created November 23, 2024 15:01
Generated Queries
Payments:
Fees:
- I'm really frustrated here. I've been trying to figure out Klarna charges for
over a week now, and no one told me whether there's an annual fee or any hidden
costs. I already have issues with late payment fees. Can someone help?
Gift Cards:
- I tried buying a gift card with Klarna, but I can't figure out how to send it
to my friend.
Payment issues:
- Hmm I can't find my payment plan for some reason even though it's active?
@ivanleomk
ivanleomk / queries.yml
Created November 20, 2024 13:12
Sample Questions for Klarna
Account & settings:
- question: "What should I do if the email verification code for login isn't working?"
subcategory: "Login"
- question: "How do I enable two-factor authentication for my account for added security?"
subcategory: "Manage account"
- question: "What can I do if my Face ID isn't working for logging in?"
subcategory: "Login"
- question: "How can I update my phone number in my Klarna account?"
subcategory: "Manage account"
- question: "How do I update my account password?"
@ivanleomk
ivanleomk / benchmark.py
Created November 12, 2024 06:36
This is a gist that you can use to benchmark cerebras
from typing import List, Optional
from pydantic import BaseModel, Field
from rich.console import Console
from tqdm.asyncio import tqdm_asyncio
import asyncio
import instructor
from datetime import date
from cerebras.cloud.sdk import AsyncCerebras
console = Console()
import os
import json
from typing import Any, TypeVar, Literal
import asyncpg
from dotenv import load_dotenv
from braintrust import init
from pgvector.asyncpg import register_vector
from asyncio import Semaphore, timeout, run
from tqdm.asyncio import tqdm_asyncio as asyncio
@ivanleomk
ivanleomk / main.py
Created October 3, 2024 05:18
Tool Calling Failing
from cerebras.cloud.sdk import Cerebras
import instructor
import os
client = Cerebras(
api_key=os.environ.get("CEREBRAS_API_KEY"),
)
chat_completion = client.chat.completions.create(
model="llama3.1-8b",