This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # Text-to-SQL SFT + RL Recipe | |
| ## 1. Overview | |
| * **Elevator Pitch:** This recipe acts as a guided pipeline to train the `google/gemma-4-e2b` model to robustly translate natural language questions into executable SQL queries. | |
| * **What the script does:** The orchestration script (`texttosql_sft_grpo.py`) leverages local Open-RL services to pull data, execute lightweight LoRA training steps, and query an isolated SQL runtime to score accuracy. | |
| * **Methods:** A dual-phase pipeline consisting of: | |
| * **SFT Warmup:** Quick alignment to basic SQL syntax patterns. | |
| * **GRPO / PPO RL:** Optimizes against sparse database execution feedback (such as compile rates and matching rows). | |
| * **Presets:** Configured under `gemma4_e2b_rl_recipe`. |