Skip to content

Instantly share code, notes, and snippets.

@migtissera
Last active March 30, 2024 22:08
Show Gist options
  • Save migtissera/16bb881e69ae533909ab9f0faf1201da to your computer and use it in GitHub Desktop.
Save migtissera/16bb881e69ae533909ab9f0faf1201da to your computer and use it in GitHub Desktop.
Axolotl config to train a Mistral-7B base using QLoRA on 24GB GPU
base_model: /path/to/Mistral-7B-v0.1
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer
is_llama_derived_model: true
load_in_8bit: false
load_in_4bit: true
strict: false
datasets:
- path: /path/to/Sensei/created/dataset.jsonl
type:
system_prompt: "SYSTEM: {system}"
field_system: system
format: "USER: {instruction}\nASSISTANT: "
no_input_format: "USER: {instruction}\nASSISTANT: "
field_output: response
dataset_prepared_path: /path/to/a/folder/to/save/prepped_datasets
val_set_size: 0
output_dir: /your/trained/model/path
adapter: qlora
lora_model_dir:
sequence_len: 4096
sample_packing: true
pad_to_sequence_len: true
lora_r: 64
lora_alpha: 16
lora_dropout: 0.05
lora_target_modules:
lora_target_linear: true
lora_fan_in_fan_out:
lora_modules_to_save:
- embed_tokens
- lm_head
wandb_project: axolotl-training-runs
wandb_entity:
wandb_watch:
wandb_run_id:
wandb_log_model:
gradient_accumulation_steps: 3
micro_batch_size: 2
num_epochs: 2
optimizer: paged_adamw_32bit
lr_scheduler: cosine
learning_rate: 0.0003
train_on_inputs: false
group_by_length: false
bf16: true
fp16: false
tf32: false
gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
eval_sample_packing: true
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true
warmup_steps: 100
evals_per_epoch: 2
saves_per_epoch: 2
debug:
deepspeed:
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:
bos_token: "<s>"
eos_token: "</s>"
unk_token: "<unk>"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment