How to launch multinode job on slurm deepspeed specific
sbatch main.sh```
you can execute any python script instead of check_ds.py
you can execute deepspeed like:
``` deepspeed run_clm.py --hostfile=$DLTS_HOSTFILE --master_addr=$MASTER_ADDR --model_name_or_path=Salesforce/codegen-6B-mono --per_device_train_batch_size=2 --num_train_epochs 1 --save_strategy=epoch --output_dir=finetune_codegen_aligned_title_py --report_to "wandb" --dataset_name alignedWithTitle2048Py --tokenizer_name customCodeGen --block_size 2048 --gradient_accumulation_steps 2 --do_train --do_eval --evaluation_strategy=epoch --logging_strategy=epoch --fp16 --overwrite_output_dir --adam_beta1=0.9 --adam_beta2=0.95 --weight_decay=2e-02 --learning_rate=1e-05 --warmup_steps=895 --per_device_eval_batch_size=1 --cache_dir="hf_cache" --gradient_checkpointing=True --deepspeed ds_config3.json ```
if ``` --nnodes=2 ``` flag is given then we only execute on only two nodes rahter than everything that's been given hostfile