Skip to content

Instantly share code, notes, and snippets.

View chuangw6's full-sized avatar
👾
Focusing

Chuang Wang chuangw6

👾
Focusing
View GitHub Profile
# Text-to-SQL SFT + RL Recipe
## 1. Overview
* **Elevator Pitch:** This recipe acts as a guided pipeline to train the `google/gemma-4-e2b` model to robustly translate natural language questions into executable SQL queries.
* **What the script does:** The orchestration script (`texttosql_sft_grpo.py`) leverages local Open-RL services to pull data, execute lightweight LoRA training steps, and query an isolated SQL runtime to score accuracy.
* **Methods:** A dual-phase pipeline consisting of:
* **SFT Warmup:** Quick alignment to basic SQL syntax patterns.
* **GRPO / PPO RL:** Optimizes against sparse database execution feedback (such as compile rates and matching rows).
* **Presets:** Configured under `gemma4_e2b_rl_recipe`.