toanbku/gist:760b3c71e5ae44364e52769995b2f2d4

## gistfile1.txt
(base) [RedmondAI] ubuntu@oa-server-8:~/OA/model/model_training$ CUDA_VISIBLE_DEVICES=0,1,2,3,4,5 OMP_NUM_THREADS=1 accelerate launch --main_process_port 29501 --config_file configs/accelerate_config.yaml --num_processes 6 trainer_rl.py --configs defaults defaults_rlhf pythia_rlhf oasst_df_x1000
[2023-07-21 03:19:15,948] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-07-21 03:19:19,845] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-07-21 03:19:19,845] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-07-21 03:19:19,864] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-07-21 03:19:19,865] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-07-21 03:19:19,873] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-07-21 03:19:19,979] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect)
RNG seed: 2703368087
RNG seed: 2703368087
RNG seed: 2703368087
RNG seed: 2703368087
RNG seed: 2703368087
Found cached dataset json (/home/ubuntu/.cache/huggingface/datasets/toanbku___json/toanbku--oa-df-x1000-330ee3ae28b11a32/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
Found cached dataset json (/home/ubuntu/.cache/huggingface/datasets/toanbku___json/toanbku--oa-df-x1000-330ee3ae28b11a32/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
Found cached dataset json (/home/ubuntu/.cache/huggingface/datasets/toanbku___json/toanbku--oa-df-x1000-330ee3ae28b11a32/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
Found cached dataset json (/home/ubuntu/.cache/huggingface/datasets/toanbku___json/toanbku--oa-df-x1000-330ee3ae28b11a32/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
Found cached dataset json (/home/ubuntu/.cache/huggingface/datasets/toanbku___json/toanbku--oa-df-x1000-330ee3ae28b11a32/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
RNG seed: 2703368087
Found cached dataset json (/home/ubuntu/.cache/huggingface/datasets/toanbku___json/toanbku--oa-df-x1000-330ee3ae28b11a32/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
OASST HF dataset toanbku/oa-df-x1000: len(train)=1970, len(val)=330
OASST HF dataset toanbku/oa-df-x1000: len(train)=1970, len(val)=330
OASST HF dataset toanbku/oa-df-x1000: len(train)=1970, len(val)=330
OASST HF dataset toanbku/oa-df-x1000: len(train)=1970, len(val)=330
OASST HF dataset toanbku/oa-df-x1000: len(train)=1970, len(val)=330
len self.tokenizer 50282
[2023-07-21 03:19:25,472] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
[2023-07-21 03:19:25,472] [INFO] [comm.py:616:init_distributed] cdb=None
len self.tokenizer 50282
[2023-07-21 03:19:25,477] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
[2023-07-21 03:19:25,477] [INFO] [comm.py:616:init_distributed] cdb=None
[2023-07-21 03:19:25,477] [INFO] [comm.py:643:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
len self.tokenizer 50282
[2023-07-21 03:19:25,539] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
[2023-07-21 03:19:25,539] [INFO] [comm.py:616:init_distributed] cdb=None
len self.tokenizer 50282
[2023-07-21 03:19:25,551] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
[2023-07-21 03:19:25,552] [INFO] [comm.py:616:init_distributed] cdb=None
len self.tokenizer 50282
[2023-07-21 03:19:25,567] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
[2023-07-21 03:19:25,567] [INFO] [comm.py:616:init_distributed] cdb=None
OASST HF dataset toanbku/oa-df-x1000: len(train)=1970, len(val)=330
len self.tokenizer 50282
[2023-07-21 03:19:26,154] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
[2023-07-21 03:19:26,154] [INFO] [comm.py:616:init_distributed] cdb=None
[RANK 0] Initializing model: toanbku/oa-pythia-12b-sft-df
Loading checkpoint shards: 100%|███████████████████████████████████████████████████| 3/3 [00:20<00:00,  6.87s/it]
Resizing embeddings to 50282
Loading checkpoint shards: 100%|███████████████████████████████████████████████████| 3/3 [00:20<00:00,  6.94s/it]
Resizing embeddings to 50282
Loading checkpoint shards: 100%|███████████████████████████████████████████████████| 3/3 [00:24<00:00,  8.06s/it]
Resizing embeddings to 50282
Loading checkpoint shards: 100%|███████████████████████████████████████████████████| 3/3 [00:28<00:00,  9.44s/it]
Resizing embeddings to 50282
Loading checkpoint shards: 100%|███████████████████████████████████████████████████| 3/3 [00:26<00:00,  8.74s/it]
Resizing embeddings to 50282
Loading checkpoint shards: 100%|███████████████████████████████████████████████████| 3/3 [00:27<00:00,  9.26s/it]
Resizing embeddings to 50282
Number of trainable parameters: 11841M
Number of trainable parameters: 11841M
Downloading (…)model.bin.index.json: 100%|███████████████████████████████████| 47.3k/47.3k [00:00<00:00, 128MB/s]
[2023-07-21 03:24:00,596] [WARNING] [cpu_adam.py:84:__init__] FP16 params for CPUAdam may not work on AMD CPUs
[2023-07-21 03:24:00,626] [WARNING] [cpu_adam.py:84:__init__] FP16 params for CPUAdam may not work on AMD CPUs
Using /home/ubuntu/.cache/torch_extensions/py310_cu118 as PyTorch extensions root...
Creating extension directory /home/ubuntu/.cache/torch_extensions/py310_cu118/cpu_adam...
Using /home/ubuntu/.cache/torch_extensions/py310_cu118 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /home/ubuntu/.cache/torch_extensions/py310_cu118/cpu_adam/build.ninja...
Building extension module cpu_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
Number of trainable parameters: 11841M
Number of trainable parameters: 11841M
[2023-07-21 03:24:10,252] [WARNING] [cpu_adam.py:84:__init__] FP16 params for CPUAdam may not work on AMD CPUs
Number of trainable parameters: 11841M
[2023-07-21 03:24:11,981] [WARNING] [cpu_adam.py:84:__init__] FP16 params for CPUAdam may not work on AMD CPUs
Number of trainable parameters: 11841M
[1/3] /home/ubuntu/mambaforge/bin/nvcc  -ccbin /home/ubuntu/mambaforge/bin/x86_64-conda-linux-gnu-cc -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/ubuntu/mambaforge/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -I/home/ubuntu/mambaforge/include -isystem /home/ubuntu/mambaforge/lib/python3.10/site-packages/torch/include -isystem /home/ubuntu/mambaforge/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/ubuntu/mambaforge/lib/python3.10/site-packages/torch/include/TH -isystem /home/ubuntu/mambaforge/lib/python3.10/site-packages/torch/include/THC -isystem /home/ubuntu/mambaforge/include -isystem /home/ubuntu/mambaforge/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -DBF16_AVAILABLE -c /home/ubuntu/mambaforge/lib/python3.10/site-packages/deepspeed/ops/csrc/common/custom_cuda_kernel.cu -o custom_cuda_kernel.cuda.o
[2023-07-21 03:24:12,718] [WARNING] [cpu_adam.py:84:__init__] FP16 params for CPUAdam may not work on AMD CPUs
Using /home/ubuntu/.cache/torch_extensions/py310_cu118 as PyTorch extensions root...
wandb: Currently logged in as: toanbku. Use `wandb login --relogin` to force relogin
Using /home/ubuntu/.cache/torch_extensions/py310_cu118 as PyTorch extensions root...
Using /home/ubuntu/.cache/torch_extensions/py310_cu118 as PyTorch extensions root...
wandb: Tracking run with wandb version 0.15.5
wandb: Run data is saved locally in /home/ubuntu/OA/model/model_training/wandb/run-20230721_032413-7g1yavb6
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run trainer_rl/oa-pythia-12b-sft-df/6gpus:main
wandb: ⭐️ View project at https://wandb.ai/toanbku/rlhf
wandb: 🚀 View run at https://wandb.ai/toanbku/rlhf/runs/7g1yavb6
[2023-07-21 03:24:21,849] [WARNING] [cpu_adam.py:84:__init__] FP16 params for CPUAdam may not work on AMD CPUs
Using /home/ubuntu/.cache/torch_extensions/py310_cu118 as PyTorch extensions root...
[2/3] /home/ubuntu/mambaforge/bin/x86_64-conda-linux-gnu-c++ -MMD -MF cpu_adam.o.d -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/ubuntu/mambaforge/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -I/home/ubuntu/mambaforge/include -isystem /home/ubuntu/mambaforge/lib/python3.10/site-packages/torch/include -isystem /home/ubuntu/mambaforge/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/ubuntu/mambaforge/lib/python3.10/site-packages/torch/include/TH -isystem /home/ubuntu/mambaforge/lib/python3.10/site-packages/torch/include/THC -isystem /home/ubuntu/mambaforge/include -isystem /home/ubuntu/mambaforge/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -L/home/ubuntu/mambaforge/lib64 -lcudart -lcublas -g -march=native -fopenmp -D__AVX256__ -D__ENABLE_CUDA__ -DBF16_AVAILABLE -c /home/ubuntu/mambaforge/lib/python3.10/site-packages/deepspeed/ops/csrc/adam/cpu_adam.cpp -o cpu_adam.o
[3/3] /home/ubuntu/mambaforge/bin/x86_64-conda-linux-gnu-c++ cpu_adam.o custom_cuda_kernel.cuda.o -shared -lcurand -L/home/ubuntu/mambaforge/lib/python3.10/site-packages/torch/lib -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -ltorch_python -L/home/ubuntu/mambaforge/lib64 -lcudart -o cpu_adam.so
Loading extension module cpu_adam...
Time to load cpu_adam op: 33.74172616004944 seconds
Loading extension module cpu_adam...
Time to load cpu_adam op: 22.379701375961304 seconds
Loading extension module cpu_adam...
Time to load cpu_adam op: 24.125978469848633 seconds
Loading extension module cpu_adam...
Loading extension module cpu_adam...
Time to load cpu_adam op: 33.770209074020386 seconds
Time to load cpu_adam op: 12.540481567382812 seconds
Loading extension module cpu_adam...
Time to load cpu_adam op: 21.731924295425415 seconds
Adam Optimizer #0 is created with AVX2 arithmetic capability.
Config: alpha=0.000001, betas=(0.900000, 0.999000), weight_decay=0.000001, adam_w=1
Adam Optimizer #0 is created with AVX2 arithmetic capability.
Config: alpha=0.000001, betas=(0.900000, 0.999000), weight_decay=0.000001, adam_w=1
[2023-07-21 03:24:35,325] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed info: version=0.10.0, git-hash=unknown, git-branch=unknown
Adam Optimizer #0 is created with AVX2 arithmetic capability.
Config: alpha=0.000001, betas=(0.900000, 0.999000), weight_decay=0.000001, adam_w=1
Adam Optimizer #0 is created with AVX2 arithmetic capability.
Config: alpha=0.000001, betas=(0.900000, 0.999000), weight_decay=0.000001, adam_w=1
Adam Optimizer #0 is created with AVX2 arithmetic capability.
Config: alpha=0.000001, betas=(0.900000, 0.999000), weight_decay=0.000001, adam_w=1
Adam Optimizer #0 is created with AVX2 arithmetic capability.
Config: alpha=0.000001, betas=(0.900000, 0.999000), weight_decay=0.000001, adam_w=1
[2023-07-21 03:24:51,078] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False
[2023-07-21 03:24:51,081] [INFO] [logging.py:96:log_dist] [Rank 0] Removing param_group that has no 'params' in the client Optimizer
[2023-07-21 03:24:51,081] [INFO] [logging.py:96:log_dist] [Rank 0] Using client Optimizer as basic optimizer
[2023-07-21 03:24:51,121] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Basic Optimizer = DeepSpeedCPUAdam
[2023-07-21 03:24:51,121] [INFO] [utils.py:54:is_zero_supported_optimizer] Checking ZeRO support for optimizer=DeepSpeedCPUAdam type=<class 'deepspeed.ops.adam.cpu_adam.DeepSpeedCPUAdam'>
[2023-07-21 03:24:51,122] [INFO] [logging.py:96:log_dist] [Rank 0] Creating torch.float16 ZeRO stage 2 optimizer
[2023-07-21 03:24:51,122] [INFO] [stage_1_and_2.py:133:__init__] Reduce bucket size 200000000
[2023-07-21 03:24:51,122] [INFO] [stage_1_and_2.py:134:__init__] Allgather bucket size 200000000
[2023-07-21 03:24:51,122] [INFO] [stage_1_and_2.py:135:__init__] CPU Offload: True
[2023-07-21 03:24:51,122] [INFO] [stage_1_and_2.py:136:__init__] Round robin gradient partitioning: False
Rank: 5 partition count [6] and sizes[(1982394028, False)]
Rank: 3 partition count [6] and sizes[(1982394028, False)]
Rank: 1 partition count [6] and sizes[(1982394028, False)]
Rank: 0 partition count [6] and sizes[(1982394028, False)]
Rank: 4 partition count [6] and sizes[(1982394028, False)]
Rank: 2 partition count [6] and sizes[(1982394028, False)]
[2023-07-21 03:25:29,511] [INFO] [utils.py:785:see_memory_usage] Before initializing optimizer states
[2023-07-21 03:25:29,514] [INFO] [utils.py:786:see_memory_usage] MA 22.78 GB         Max_MA 22.78 GB         CA 22.8 GB         Max_CA 23 GB
[2023-07-21 03:25:29,514] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory:  used = 83.13 GB, percent = 11.0%
[2023-07-21 03:25:46,648] [INFO] [utils.py:785:see_memory_usage] After initializing optimizer states
[2023-07-21 03:25:46,650] [INFO] [utils.py:786:see_memory_usage] MA 22.78 GB         Max_MA 22.78 GB         CA 22.8 GB         Max_CA 23 GB
[2023-07-21 03:25:46,650] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory:  used = 208.3 GB, percent = 27.6%
[2023-07-21 03:25:46,650] [INFO] [stage_1_and_2.py:493:__init__] optimizer state initialized
[2023-07-21 03:25:46,767] [INFO] [utils.py:785:see_memory_usage] After initializing ZeRO optimizer
[2023-07-21 03:25:46,769] [INFO] [utils.py:786:see_memory_usage] MA 22.78 GB         Max_MA 22.78 GB         CA 22.8 GB         Max_CA 23 GB
[2023-07-21 03:25:46,769] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory:  used = 208.34 GB, percent = 27.6%
[2023-07-21 03:25:46,772] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Final Optimizer = DeepSpeedCPUAdam
[2023-07-21 03:25:46,773] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed using client LR scheduler
[2023-07-21 03:25:46,773] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed LR Scheduler = None
[2023-07-21 03:25:46,773] [INFO] [logging.py:96:log_dist] [Rank 0] step=0, skipped=0, lr=[1e-06], mom=[[0.9, 0.95]]
[2023-07-21 03:25:46,774] [INFO] [config.py:960:print] DeepSpeedEngine configuration:
[2023-07-21 03:25:46,774] [INFO] [config.py:964:print]   activation_checkpointing_config  {
    "partition_activations": false,
    "contiguous_memory_optimization": false,
    "cpu_checkpointing": false,
    "number_checkpoints": null,
    "synchronize_checkpoint_boundary": false,
    "profile": false
}
[2023-07-21 03:25:46,774] [INFO] [config.py:964:print]   aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True}
[2023-07-21 03:25:46,775] [INFO] [config.py:964:print]   amp_enabled .................. False
[2023-07-21 03:25:46,775] [INFO] [config.py:964:print]   amp_params ................... False
[2023-07-21 03:25:46,775] [INFO] [config.py:964:print]   autotuning_config ............ {
    "enabled": false,
    "start_step": null,
    "end_step": null,
    "metric_path": null,
    "arg_mappings": null,
    "metric": "throughput",
    "model_info": null,
    "results_dir": "autotuning_results",
    "exps_dir": "autotuning_exps",
    "overwrite": true,
    "fast": true,
    "start_profile_step": 3,
    "end_profile_step": 5,
    "tuner_type": "gridsearch",
    "tuner_early_stopping": 5,
    "tuner_num_trials": 50,
    "model_info_path": null,
    "mp_size": 1,
    "max_train_batch_size": null,
    "min_train_batch_size": 1,
    "max_train_micro_batch_size_per_gpu": 1.024000e+03,
    "min_train_micro_batch_size_per_gpu": 1,
    "num_tuning_micro_batch_sizes": 3
}
[2023-07-21 03:25:46,775] [INFO] [config.py:964:print]   bfloat16_enabled ............. False
[2023-07-21 03:25:46,775] [INFO] [config.py:964:print]   checkpoint_parallel_write_pipeline  False
[2023-07-21 03:25:46,775] [INFO] [config.py:964:print]   checkpoint_tag_validation_enabled  True
[2023-07-21 03:25:46,775] [INFO] [config.py:964:print]   checkpoint_tag_validation_fail  False
[2023-07-21 03:25:46,776] [INFO] [config.py:964:print]   comms_config ................. <deepspeed.comm.config.DeepSpeedCommsConfig object at 0x7f9d7868f010>
[2023-07-21 03:25:46,776] [INFO] [config.py:964:print]   communication_data_type ...... None
[2023-07-21 03:25:46,776] [INFO] [config.py:964:print]   compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}}
[2023-07-21 03:25:46,776] [INFO] [config.py:964:print]   curriculum_enabled_legacy .... False
[2023-07-21 03:25:46,776] [INFO] [config.py:964:print]   curriculum_params_legacy ..... False
[2023-07-21 03:25:46,776] [INFO] [config.py:964:print]   data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}}
[2023-07-21 03:25:46,776] [INFO] [config.py:964:print]   data_efficiency_enabled ...... False
[2023-07-21 03:25:46,776] [INFO] [config.py:964:print]   dataloader_drop_last ......... False
[2023-07-21 03:25:46,776] [INFO] [config.py:964:print]   disable_allgather ............ False
[2023-07-21 03:25:46,776] [INFO] [config.py:964:print]   dump_state ................... False
[2023-07-21 03:25:46,776] [INFO] [config.py:964:print]   dynamic_loss_scale_args ...... {'init_scale': 4096, 'scale_window': 1000, 'delayed_shift': 2, 'consecutive_hysteresis': False, 'min_scale': 1}
[2023-07-21 03:25:46,776] [INFO] [config.py:964:print]   eigenvalue_enabled ........... False
[2023-07-21 03:25:46,776] [INFO] [config.py:964:print]   eigenvalue_gas_boundary_resolution  1
[2023-07-21 03:25:46,776] [INFO] [config.py:964:print]   eigenvalue_layer_name ........ bert.encoder.layer
[2023-07-21 03:25:46,777] [INFO] [config.py:964:print]   eigenvalue_layer_num ......... 0
[2023-07-21 03:25:46,777] [INFO] [config.py:964:print]   eigenvalue_max_iter .......... 100
[2023-07-21 03:25:46,777] [INFO] [config.py:964:print]   eigenvalue_stability ......... 1e-06
[2023-07-21 03:25:46,777] [INFO] [config.py:964:print]   eigenvalue_tol ............... 0.01
[2023-07-21 03:25:46,777] [INFO] [config.py:964:print]   eigenvalue_verbose ........... False
[2023-07-21 03:25:46,777] [INFO] [config.py:964:print]   elasticity_enabled ........... False
[2023-07-21 03:25:46,777] [INFO] [config.py:964:print]   flops_profiler_config ........ {
    "enabled": false,
    "recompute_fwd_factor": 0.0,
    "profile_step": 1,
    "module_depth": -1,
    "top_modules": 1,
    "detailed": true,
    "output_file": null
}
[2023-07-21 03:25:46,777] [INFO] [config.py:964:print]   fp16_auto_cast ............... False
[2023-07-21 03:25:46,777] [INFO] [config.py:964:print]   fp16_enabled ................. true
[2023-07-21 03:25:46,777] [INFO] [config.py:964:print]   fp16_master_weights_and_gradients  False
[2023-07-21 03:25:46,777] [INFO] [config.py:964:print]   global_rank .................. 0
[2023-07-21 03:25:46,777] [INFO] [config.py:964:print]   grad_accum_dtype ............. None
[2023-07-21 03:25:46,777] [INFO] [config.py:964:print]   gradient_accumulation_steps .. 1
[2023-07-21 03:25:46,778] [INFO] [config.py:964:print]   gradient_clipping ............ 1.0
[2023-07-21 03:25:46,778] [INFO] [config.py:964:print]   gradient_predivide_factor .... 1.0
[2023-07-21 03:25:46,778] [INFO] [config.py:964:print]   hybrid_engine ................ enabled=False max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8
[2023-07-21 03:25:46,778] [INFO] [config.py:964:print]   initial_dynamic_scale ........ 4096
[2023-07-21 03:25:46,778] [INFO] [config.py:964:print]   load_universal_checkpoint .... False
[2023-07-21 03:25:46,778] [INFO] [config.py:964:print]   loss_scale ................... 0
[2023-07-21 03:25:46,778] [INFO] [config.py:964:print]   memory_breakdown ............. False
[2023-07-21 03:25:46,778] [INFO] [config.py:964:print]   mics_hierarchial_params_gather  False
[2023-07-21 03:25:46,778] [INFO] [config.py:964:print]   mics_shard_size .............. -1
[2023-07-21 03:25:46,778] [INFO] [config.py:964:print]   monitor_config ............... tensorboard=TensorBoardConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') enabled=False
[2023-07-21 03:25:46,779] [INFO] [config.py:964:print]   nebula_config ................ {
    "enabled": false,
    "persistent_storage_path": null,
    "persistent_time_interval": 100,
    "num_of_version_in_retention": 2,
    "enable_nebula_load": true,
    "load_path": null
}
[2023-07-21 03:25:46,779] [INFO] [config.py:964:print]   optimizer_legacy_fusion ...... False
[2023-07-21 03:25:46,779] [INFO] [config.py:964:print]   optimizer_name ............... None
[2023-07-21 03:25:46,779] [INFO] [config.py:964:print]   optimizer_params ............. None
[2023-07-21 03:25:46,779] [INFO] [config.py:964:print]   pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0}
[2023-07-21 03:25:46,779] [INFO] [config.py:964:print]   pld_enabled .................. False
[2023-07-21 03:25:46,779] [INFO] [config.py:964:print]   pld_params ................... False
[2023-07-21 03:25:46,779] [INFO] [config.py:964:print]   prescale_gradients ........... False
[2023-07-21 03:25:46,779] [INFO] [config.py:964:print]   scheduler_name ............... None
[2023-07-21 03:25:46,779] [INFO] [config.py:964:print]   scheduler_params ............. None
[2023-07-21 03:25:46,779] [INFO] [config.py:964:print]   sparse_attention ............. None
[2023-07-21 03:25:46,779] [INFO] [config.py:964:print]   sparse_gradients_enabled ..... False
[2023-07-21 03:25:46,779] [INFO] [config.py:964:print]   steps_per_print .............. inf
[2023-07-21 03:25:46,779] [INFO] [config.py:964:print]   train_batch_size ............. 6
[2023-07-21 03:25:46,780] [INFO] [config.py:964:print]   train_micro_batch_size_per_gpu  1
[2023-07-21 03:25:46,780] [INFO] [config.py:964:print]   use_node_local_storage ....... False
[2023-07-21 03:25:46,780] [INFO] [config.py:964:print]   wall_clock_breakdown ......... False
[2023-07-21 03:25:46,780] [INFO] [config.py:964:print]   world_size ................... 6
[2023-07-21 03:25:46,780] [INFO] [config.py:964:print]   zero_allow_untested_optimizer  True
[2023-07-21 03:25:46,780] [INFO] [config.py:964:print]   zero_config .................. stage=2 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=200000000 allgather_partitions=True allgather_bucket_size=200000000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=None offload_optimizer=DeepSpeedZeroOffloadOptimizerConfig(device='cpu', nvme_path=None, buffer_count=4, pin_memory=True, pipeline=False, pipeline_read=False, pipeline_write=False, fast_init=False) sub_group_size=1,000,000,000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50,000,000 param_persistence_threshold=100,000 model_persistence_threshold=sys.maxsize max_live_parameters=1,000,000,000 max_reuse_distance=1,000,000,000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False zero_hpz_partition_size=1 zero_quantized_weights=False zero_quantized_gradients=False mics_shard_size=-1 mics_hierarchical_params_gather=False memory_efficient_linear=True
[2023-07-21 03:25:46,780] [INFO] [config.py:964:print]   zero_enabled ................. True
[2023-07-21 03:25:46,780] [INFO] [config.py:964:print]   zero_force_ds_cpu_optimizer .. True
[2023-07-21 03:25:46,780] [INFO] [config.py:964:print]   zero_optimization_stage ...... 2
[2023-07-21 03:25:46,780] [INFO] [config.py:950:print_user_config]   json = {
    "fp16": {
        "enabled": "true",
        "loss_scale": 0,
        "loss_scale_window": 1000,
        "initial_scale_power": 12,
        "hysteresis": 2,
        "min_loss_scale": 1,
        "auto_cast": false
    },
    "zero_optimization": {
        "stage": 2,
        "allgather_partitions": true,
        "allgather_bucket_size": 2.000000e+08,
        "overlap_comm": false,
        "reduce_scatter": true,
        "reduce_bucket_size": 2.000000e+08,
        "contiguous_gradients": true,
        "offload_optimizer": {
            "device": "cpu",
            "pin_memory": true
        }
    },
    "gradient_accumulation_steps": 1,
    "train_micro_batch_size_per_gpu": 1,
    "train_batch_size": 6,
    "gradient_clipping": 1.0,
    "steps_per_print": inf,
    "wall_clock_breakdown": false,
    "bf16": {
        "enabled": false
    },
    "zero_allow_untested_optimizer": true
}
Loading checkpoint shards: 100%|███████████████████████████████████████████████████| 3/3 [00:18<00:00,  6.02s/it]
Resizing embeddings to 50282
Loading checkpoint shards: 100%|███████████████████████████████████████████████████| 3/3 [00:22<00:00,  7.37s/it]
Resizing embeddings to 50282
Loading checkpoint shards: 100%|███████████████████████████████████████████████████| 3/3 [00:21<00:00,  7.24s/it]
Resizing embeddings to 50282
Loading checkpoint shards: 100%|███████████████████████████████████████████████████| 3/3 [00:21<00:00,  7.16s/it]
Resizing embeddings to 50282
Loading checkpoint shards: 100%|███████████████████████████████████████████████████| 3/3 [00:22<00:00,  7.46s/it]
Resizing embeddings to 50282
Number of trainable parameters: 11841M
Loading checkpoint shards:  33%|█████████████████                                  | 1/3 [00:09<00:19,  9.60s/it]Number of trainable parameters: 11841M
Number of trainable parameters: 11841M
Loading checkpoint shards: 100%|███████████████████████████████████████████████████| 3/3 [00:22<00:00,  7.66s/it]
Resizing embeddings to 50282
Number of trainable parameters: 11841M
Number of trainable parameters: 11841M
[RANK 0] Starting training
[rollout 0 / 16]:   0%|                                                                   | 0/16 [00:00<?, ?it/s]You're using a GPTNeoXTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
You're using a GPTNeoXTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
Number of trainable parameters: 11841M
You're using a GPTNeoXTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
You're using a GPTNeoXTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
You're using a GPTNeoXTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
You're using a GPTNeoXTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
[rollout 16 / 16]: 100%|█████████████████████████████████████████████████████████| 16/16 [05:07<00:00, 19.22s/it]
[RANK 0] Evaluating model
[generation sweep 1/1 | eval batch 11/11]: 100%|██████████████████████████████████████████████| 11/11 [00:59<00:00,  5.40s/it]
[RANK 0] Computing rewards
Traceback (most recent call last):
  File "/home/ubuntu/OA/model/model_training/trainer_rl.py", line 199, in <module>
    main()
  File "/home/ubuntu/OA/model/model_training/trainer_rl.py", line 184, in main
    trainer = trlx.train(
  File "/home/ubuntu/mambaforge/lib/python3.10/site-packages/trlx/trlx.py", line 126, in train
    trainer.learn()
  File "/home/ubuntu/mambaforge/lib/python3.10/site-packages/trlx/trainer/accelerate_base_trainer.py", line 539, in learn
    results = self.evaluate()
  File "/home/ubuntu/mambaforge/lib/python3.10/site-packages/trlx/trainer/accelerate_base_trainer.py", line 430, in evaluate
    rewards = self.reward_fn(
TypeError: create_reward_fn.<locals>.reward_fn() got an unexpected keyword argument 'tokenizer'
Traceback (most recent call last):
  File "/home/ubuntu/OA/model/model_training/trainer_rl.py", line 199, in <module>
    main()
  File "/home/ubuntu/OA/model/model_training/trainer_rl.py", line 184, in main
    trainer = trlx.train(
  File "/home/ubuntu/mambaforge/lib/python3.10/site-packages/trlx/trlx.py", line 126, in train
    trainer.learn()
  File "/home/ubuntu/mambaforge/lib/python3.10/site-packages/trlx/trainer/accelerate_base_trainer.py", line 539, in learn
    results = self.evaluate()
  File "/home/ubuntu/mambaforge/lib/python3.10/site-packages/trlx/trainer/accelerate_base_trainer.py", line 430, in evaluate
    rewards = self.reward_fn(
TypeError: create_reward_fn.<locals>.reward_fn() got an unexpected keyword argument 'tokenizer'
wandb: Waiting for W&B process to finish... (failed 1). Press Control-C to abort syncing.
wandb: | 0.023 MB of 0.023 MB uploaded (0.000 MB deduped)
wandb: Run history:
wandb:         exp_scores/mean ▁
wandb: exp_scores/running_mean ▁
wandb:  exp_scores/running_std ▁
wandb:          exp_scores/std ▁
wandb:            kl_ctl_value ▁
wandb:          policy/sqrt_kl ▁
wandb:                time/exp ▁
wandb:       time/exp_generate ▁
wandb:          time/exp_score ▁
wandb:
wandb: Run summary:
wandb:         exp_scores/mean 4.85738
wandb: exp_scores/running_mean 3.56623
wandb:  exp_scores/running_std 4.1059
wandb:          exp_scores/std 5.84514
wandb:            kl_ctl_value 0.1
wandb:          policy/sqrt_kl 0.01649
wandb:                time/exp 25.99541
wandb:       time/exp_generate 23.5927
wandb:          time/exp_score 0.55094
wandb:
wandb: 🚀 View run trainer_rl/oa-pythia-12b-sft-df/6gpus:main at: https://wandb.ai/toanbku/rlhf/runs/7g1yavb6
wandb: Synced 6 W&B file(s), 0 media file(s), 2 artifact file(s) and 0 other file(s)
wandb: Find logs at: ./wandb/run-20230721_032413-7g1yavb6/logs
Exception in thread NetStatThr:
Traceback (most recent call last):
  File "/home/ubuntu/mambaforge/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/home/ubuntu/mambaforge/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntu/mambaforge/lib/python3.10/site-packages/wandb/sdk/wandb_run.py", line 255, in check_network_status
    self._loop_check_status(
  File "/home/ubuntu/mambaforge/lib/python3.10/site-packages/wandb/sdk/wandb_run.py", line 211, in _loop_check_status
    local_handle = request()
  File "/home/ubuntu/mambaforge/lib/python3.10/site-packages/wandb/sdk/interface/interface.py", line 795, in deliver_network_status
    return self._deliver_network_status(status)
  File "/home/ubuntu/mambaforge/lib/python3.10/site-packages/wandb/sdk/interface/interface_shared.py", line 601, in _deliver_network_status
    return self._deliver_record(record)
  File "/home/ubuntu/mambaforge/lib/python3.10/site-packages/wandb/sdk/interface/interface_shared.py", line 560, in _deliver_record
    handle = mailbox._deliver_record(record, interface=self)
  File "/home/ubuntu/mambaforge/lib/python3.10/site-packages/wandb/sdk/lib/mailbox.py", line 455, in _deliver_record
    interface._publish(record)
  File "/home/ubuntu/mambaforge/lib/python3.10/site-packages/wandb/sdk/interface/interface_sock.py", line 51, in _publish
    self._sock_client.send_record_publish(record)
  File "/home/ubuntu/mambaforge/lib/python3.10/site-packages/wandb/sdk/lib/sock_client.py", line 221, in send_record_publish
    self.send_server_request(server_req)
  File "/home/ubuntu/mambaforge/lib/python3.10/site-packages/wandb/sdk/lib/sock_client.py", line 155, in send_server_request
    self._send_message(msg)
  File "/home/ubuntu/mambaforge/lib/python3.10/site-packages/wandb/sdk/lib/sock_client.py", line 152, in _send_message
    self._sendall_with_error_handle(header + data)
  File "/home/ubuntu/mambaforge/lib/python3.10/site-packages/wandb/sdk/lib/sock_client.py", line 130, in _sendall_with_error_handle
    sent = self._sock.send(data)
BrokenPipeError: [Errno 32] Broken pipe
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 10083 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 10084 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 10085 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 10086 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 10087 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Unable to shutdown process 10083 via 15, forcefully exiting via 9
WARNING:torch.distributed.elastic.multiprocessing.api:Unable to shutdown process 10084 via 15, forcefully exiting via 9
WARNING:torch.distributed.elastic.multiprocessing.api:Unable to shutdown process 10085 via 15, forcefully exiting via 9
WARNING:torch.distributed.elastic.multiprocessing.api:Unable to shutdown process 10086 via 15, forcefully exiting via 9
WARNING:torch.distributed.elastic.multiprocessing.api:Unable to shutdown process 10087 via 15, forcefully exiting via 9
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 10082) of binary: /home/ubuntu/mambaforge/bin/python3.10
Traceback (most recent call last):
  File "/home/ubuntu/mambaforge/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/home/ubuntu/mambaforge/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 45, in main
    args.func(args)
  File "/home/ubuntu/mambaforge/lib/python3.10/site-packages/accelerate/commands/launch.py", line 964, in launch_command
    deepspeed_launcher(args)
  File "/home/ubuntu/mambaforge/lib/python3.10/site-packages/accelerate/commands/launch.py", line 687, in deepspeed_launcher
    distrib_run.run(args)
  File "/home/ubuntu/mambaforge/lib/python3.10/site-packages/torch/distributed/run.py", line 785, in run
    elastic_launch(
  File "/home/ubuntu/mambaforge/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/home/ubuntu/mambaforge/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
trainer_rl.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2023-07-21_03:37:13
  host      : oa-server-8
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 10082)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html