Skip to content

Instantly share code, notes, and snippets.

@jmeyer1980
Last active October 11, 2025 21:43
Show Gist options
  • Save jmeyer1980/72410a889986c4bfd85f28c26c920d5d to your computer and use it in GitHub Desktop.
Save jmeyer1980/72410a889986c4bfd85f28c26c920d5d to your computer and use it in GitHub Desktop.
Deprecated-WSL2 + Solo/Team LLM Local Serve bootstrapping

Local LLM Ritual β€” Four-Scroll Doctrine

⚠️ Deprecated Notice: As of 2025-10-11 This gist has evolved and now lives at a new home. Please visit the updated repository here: jmeyer1980/vLLM-Bootstrap for the latest version and improvements.

doctrine-version: 2025.10.11

License: MIT Status: Production Ready Testing: Pending


πŸ“– Foreword

This bootstrap was created in response for my, @jmeyer1980, desire to quickly set up and distribute LLMs internally for use with Rider and other agentic IDE interfaces.

It was developed for myself and a friend who recently grabbed a 16GB VRAM card and plans on self-hosting as well. I cannot claim usability of this script for every system.

Feel free to comment with suggested updates. Please stick to those HuggingFace hosted models that can be easily pulled and served using this workflow. Collaborators should keep the overall mental model in consideration when recommending updates.


✨ Features

  • πŸš€ One-Command Setup - Single script installs everything
  • πŸ” HuggingFace Integration - Interactive authentication setup
  • 🎯 Role-Based Models - Fast (1B), Edit (4B), QA (7B), Plan (15B)
  • πŸ’¬ Chat Template Support - Automatic template detection for 12+ models
  • πŸ”Œ OpenAI API Compatible - Works with Rider AI Assistant and other tools
  • πŸ§ͺ Built-in Testing - Connection validation and system checks
  • πŸ“ Persistent Logging - All output saved for debugging
  • πŸ›‘οΈ Backup System - Preserves your customizations
  • πŸ“š Comprehensive Docs - Zero to Rider in 30 minutes

🎯 Quick Start

# 1. Install WSL (Windows PowerShell as Admin)
wsl --install -d Ubuntu

# 2. In WSL, create directory
mkdir -p ~/.config/llm-doctrine
cd ~/.config/llm-doctrine

# 3. Download and extract scripts here

# 4. Run initial setup
chmod +x *.sh
./initial-bootstrap.sh

# 5. Launch a model
source ~/torch-env/bin/activate
./daily-bootstrap.sh qa

# 6. Test connection
./test-connection.sh 8500

# 7. Configure Rider
# Settings β†’ Tools β†’ AI Assistant β†’ Models
# Add: OpenAI Compatible
# URL: http://localhost:8500/v1

See COMPLETE-GUIDE.md for detailed instructions.


πŸ“‹ Prerequisites

System Requirements

  • OS: Windows 10/11 with WSL2, or native Linux
  • GPU: NVIDIA GPU with 8GB+ VRAM (recommended)
    • CPU fallback supported but slower
  • RAM: 16GB+ recommended
  • Storage: 50GB+ free space for models

WSL Installation

  1. Open PowerShell as Administrator
  2. Install Ubuntu:
    wsl --install -d Ubuntu
  3. If WSL is already installed:
    wsl --update
  4. Restart if prompted
  5. Launch WSL:
    • Start menu β†’ type "Ubuntu" β†’ Enter
    • Windows Terminal β†’ select "Ubuntu" from dropdown
    • Run box (Win+R) β†’ type "wsl" β†’ Enter
  6. Create Linux username and password
    • ⚠️ Important: Password field shows no feedback (no dots/asterisks)

πŸ“¦ Installation

Method 1: Git Clone (Recommended)

cd ~/.config
git clone <repository-url> llm-doctrine
cd llm-doctrine
chmod +x *.sh
./initial-bootstrap.sh

Method 2: Manual Download

  1. Download the repository as ZIP
  2. Extract to ~/.config/llm-doctrine in WSL
    • Windows UNC path: \\wsl.localhost\Ubuntu\home\<username>\.config\llm-doctrine
  3. Make scripts executable:
    cd ~/.config/llm-doctrine
    chmod +x *.sh
    ./initial-bootstrap.sh

πŸš€ Usage

Launching Models

# Activate virtual environment
source ~/torch-env/bin/activate

# Launch by role
./daily-bootstrap.sh {fast|edit|qa|plan}

Model Roles

Role Tier Default Model Use Case Port Range
fast 1B Llama-3.2-1B Autocomplete, boilerplate 8100-8299
edit 4B Phi-3.5-mini Light editing, refactoring 8300-8499
qa 7B Mistral-7B General assistant, Q&A 8500-8699
plan 15B StarCoder2-15B Deep planning, architecture 8700-8899

Testing & Validation

# Validate system configuration
./validate-config.sh

# Test model connection
./test-connection.sh <port>

# Preload all models (for offline use)
./preload-models.sh

βš™οΈ Configuration

models.conf

Defines available models for each tier. Each tier has 3 models (default + 2 alts).

[7B]
default = mistralai/Mistral-7B-Instruct-v0.3
alt1 = teknium/OpenHermes-2.5-Mistral-7B
alt2 = WizardLM/WizardLM-2-7B

To switch models, edit models.conf and change which line is labeled default.

ports.conf

Defines port ranges for each tier.

[ranges]
1B = 8100-8299
4B = 8300-8499
7B = 8500-8699
15B = 8700-8899

chat-templates.conf

Maps models to their appropriate chat templates for OpenAI API compatibility.

mistralai/Mistral-7B-Instruct-v0.3 = mistral
microsoft/phi-3.5-mini-3.8b-instruct = phi3
meta-llama/Llama-3.2-1B = llama3

Note: These files are automatically generated by initial-bootstrap.sh and updated when doctrine-version changes.


πŸ”Œ Rider Integration

Setup

  1. Launch a model: ./daily-bootstrap.sh qa
  2. Open Rider
  3. Go to: Settings β†’ Tools β†’ AI Assistant β†’ Models
  4. Click Add β†’ OpenAI Compatible
  5. Configure:
    • Name: vLLM Local (Mistral 7B)
    • URL: http://localhost:8500/v1
    • API Key: (leave empty or use "dummy")
  6. Click Test Connection
  7. Expected: βœ… "Connection successful"

Usage

  • Open AI Assistant panel in Rider
  • Select your local model from dropdown
  • Start chatting or use code completion features

See COMPLETE-GUIDE.md for detailed Rider configuration with screenshots-equivalent instructions.


🎨 Architecture

Design Philosophy

  • Ritual-Framed: Temple/scroll metaphor for mental model
  • Self-Documenting: Scripts explain themselves
  • Fail-Safe: Backups, validation, clear errors
  • Progressive: Works out-of-box, advanced features optional
  • Portable: Pure bash, no compilation needed
  • Universal: OpenAI API compatibility

File Structure

~/.config/llm-doctrine/
β”œβ”€β”€ initial-bootstrap.sh      # Main setup script
β”œβ”€β”€ daily-bootstrap.sh         # Model launcher (generated)
β”œβ”€β”€ test-connection.sh         # Connection tester (generated)
β”œβ”€β”€ validate-config.sh         # System validator (generated)
β”œβ”€β”€ preload-models.sh          # Model preloader (generated)
β”œβ”€β”€ models.conf                # Model definitions (generated)
β”œβ”€β”€ ports.conf                 # Port ranges (generated)
β”œβ”€β”€ chat-templates.conf        # Template mappings (generated)
β”œβ”€β”€ README.txt                 # Quick reference (generated)
β”œβ”€β”€ logs/                      # Model server logs
β”‚   β”œβ”€β”€ qa_8500.log
β”‚   β”œβ”€β”€ fast_8100.log
β”‚   └── ...
└── models/                    # Model cache (HuggingFace)

Note: Files marked "(generated)" are created/updated by initial-bootstrap.sh and should not be manually edited unless you know what you're doing. Your changes will be backed up before updates.


πŸ› Troubleshooting

Model won't load

# Check logs
tail -f ./logs/*_*.log

# Common causes:
# 1. Insufficient VRAM β†’ Try smaller model
# 2. Missing HF auth β†’ Run: huggingface-cli login
# 3. Model not downloaded β†’ Check: ls ~/.cache/huggingface/hub/

Connection refused

# Verify model is running
./test-connection.sh <port>

# Check if port is in use
nc -z localhost <port>

# Check logs for errors
tail -f ./logs/*_*.log

Rider can't connect

# Test from Windows PowerShell
curl http://localhost:8500/health

# If fails, check WSL networking
wsl hostname -I

# Check Windows Firewall settings

See COMPLETE-GUIDE.md for comprehensive troubleshooting guide.


πŸ“š Documentation

  • COMPLETE-GUIDE.md - Comprehensive setup guide from zero to Rider
  • CHANGELOG.md - Version history and development notes
  • README.txt - Quick reference (generated by bootstrap)

πŸ§ͺ Testing Status

Component Implementation Testing Status
Core Bootstrap βœ… Complete ⚠️ Pending Production Ready
HF Authentication βœ… Complete ⚠️ Pending Needs Validation
Chat Templates βœ… Complete ⚠️ Pending Needs Model Testing
Connection Testing βœ… Complete βœ… Tested Production Ready
Config Validation βœ… Complete βœ… Tested Production Ready
Documentation βœ… Complete βœ… Reviewed Production Ready

Overall Status: Production Ready - Testing Phase


🀝 Contributing

Contributions are welcome! Please:

  1. Keep the ritual-framed mental model
  2. Test thoroughly before submitting
  3. Update documentation
  4. Follow existing code style
  5. Use the artifact writer pattern for generated files

Areas for Contribution

  • Testing chat templates with various models
  • Performance benchmarking on different GPUs
  • Additional model recommendations
  • Documentation improvements
  • Bug fixes and error handling

πŸ“„ License

MIT License

Copyright (c) 2025 Jerimiah Michael Meyer (@jmeyer1980)

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.


πŸ™ Acknowledgments

  • vLLM Team - For the excellent inference engine
  • HuggingFace - For model hosting and tooling
  • Model Creators - For amazing open-source models
  • Community - For feedback and testing

πŸ“ž Support

  • Issues: Report bugs with ./validate-config.sh output and log excerpts
  • Questions: Check COMPLETE-GUIDE.md first
  • Contributions: Pull requests welcome!

May your tokens flow freely and your context windows never overflow. πŸ›οΈ


Maintainer: @jmeyer1980
Version: 2025.10.10
Status: Production Ready (Testing Phase)

@jmeyer1980
Copy link
Author

⚠️ Deprecated Notice: As of 2025-10-11 This gist has evolved and now lives at a new home. Please visit the updated repository here: jmeyer1980/vLLM-Bootstrap for the latest version and improvements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment