You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
GRPO Reinforcement Learning - 7b GSM8k on 8xH100 / 8xA100
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# the "verifiers" repository is a clean implementation of templated GRPO reinforcement learning training environments
# this is a generic set of "install from scratch" commands complete with a deepspeed z3 config that i have been using when i spin up nodes
# it will run on the gsm8k example w/ default batch size & generation size (8), and the 8th GPU is used for vllm generations
# qwen 14b full finetuning will run on this configuration too without LoRA or CUDA OOM, at least for the gsm8k task's context sizes + generation lengths
# hyperparameters are controlled by `verifiers/utils/config_utils.py`; i have been preferring extreme grad clipping (between 0.001 and 0.01) and low beta (under 0.01)
# NOTE FEB 27: examples have moved into `verifiers/examples` not `/examples`
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original Text ("Reading a Sign 43 Times Heals Your Axe Durability" by Hunter R.)
We all know that the axe in Animal Crossing will usually break after using it too much. Of course, the axe is intentionally designed to break like this in order to make the unbreakable Golden Axe an appealing item to unlock. And yet what if I told you that by simply reading a sign over and over you can actually prevent your standard axe from ever breaking? And no, I'm not joking—you can actually sit here and read this sign over and over to heal the durability on your axe, making it theoretically invincible. I'm sure a lot of you are wondering how or why this even works, so let's take a closer look.
Creating an unbreakable axe is a really funny glitch that was recently discovered by Animal Crossing spreadsheet owner Phil. To understand how interacting with a sign heals your axe, let's discuss how axe durability works.
Normally an axe can withstand 72 hits on normal trees before breaking. Since trees take three hits to cut
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
text: `The darkness grew absolute, not that the hyperstitioner could see in the first place. His ears pricked up, however; he could hear the skittering, the mechanical hum as the machine followed him invisibly...`,
continuations: [
{
id: 'a1',
text: " The mechanical tendrils wrapped tighter around his shoulder, its grip a cold reminder of their symbiosis...",
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters