Skip to content

Instantly share code, notes, and snippets.

@lbgitjp
lbgitjp / grpo_demo.py
Created February 5, 2025 01:35 — forked from willccbb/grpo_demo.py
GRPO Llama-1B
# train_grpo.py
import re
import torch
from datasets import load_dataset, Dataset
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import LoraConfig
from trl import GRPOConfig, GRPOTrainer
# Load and prep dataset
@lbgitjp
lbgitjp / r1.py
Created February 8, 2025 03:37 — forked from vgel/r1.py
script to run deepseek-r1 with a min-thinking-tokens parameter, replacing </think> with a random continuation string to extend the model's chain of thought
import argparse
import random
import sys
from transformers import AutoModelForCausalLM, AutoTokenizer, DynamicCache
import torch
parser = argparse.ArgumentParser()
parser.add_argument("question", type=str)
parser.add_argument(
@lbgitjp
lbgitjp / grpo_qwen-0-5b_single_t4.ipynb
Created February 8, 2025 04:09 — forked from qunash/grpo_qwen-0-5b_single_t4.ipynb
grpo_qwen-0-5b_single_t4.ipynb
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@lbgitjp
lbgitjp / capabilities.txt
Created March 9, 2025 22:57 — forked from jlia0/agent loop
Manus tools and prompts
# Manus AI Assistant Capabilities
## Overview
I am an AI assistant designed to help users with a wide range of tasks using various tools and capabilities. This document provides a more detailed overview of what I can do while respecting proprietary information boundaries.
## General Capabilities
### Information Processing
- Answering questions on diverse topics using available information
- Conducting research through web searches and data analysis