Created
July 21, 2023 03:54
-
-
Save toanbku/760b3c71e5ae44364e52769995b2f2d4 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
(base) [RedmondAI] ubuntu@oa-server-8:~/OA/model/model_training$ CUDA_VISIBLE_DEVICES=0,1,2,3,4,5 OMP_NUM_THREADS=1 accelerate launch --main_process_port 29501 --config_file configs/accelerate_config.yaml --num_processes 6 trainer_rl.py --configs defaults defaults_rlhf pythia_rlhf oasst_df_x1000 | |
[2023-07-21 03:19:15,948] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect) | |
[2023-07-21 03:19:19,845] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect) | |
[2023-07-21 03:19:19,845] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect) | |
[2023-07-21 03:19:19,864] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect) | |
[2023-07-21 03:19:19,865] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect) | |
[2023-07-21 03:19:19,873] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect) | |
[2023-07-21 03:19:19,979] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect) | |
RNG seed: 2703368087 | |
RNG seed: 2703368087 | |
RNG seed: 2703368087 | |
RNG seed: 2703368087 | |
RNG seed: 2703368087 | |
Found cached dataset json (/home/ubuntu/.cache/huggingface/datasets/toanbku___json/toanbku--oa-df-x1000-330ee3ae28b11a32/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96) | |
Found cached dataset json (/home/ubuntu/.cache/huggingface/datasets/toanbku___json/toanbku--oa-df-x1000-330ee3ae28b11a32/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96) | |
Found cached dataset json (/home/ubuntu/.cache/huggingface/datasets/toanbku___json/toanbku--oa-df-x1000-330ee3ae28b11a32/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96) | |
Found cached dataset json (/home/ubuntu/.cache/huggingface/datasets/toanbku___json/toanbku--oa-df-x1000-330ee3ae28b11a32/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96) | |
Found cached dataset json (/home/ubuntu/.cache/huggingface/datasets/toanbku___json/toanbku--oa-df-x1000-330ee3ae28b11a32/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96) | |
RNG seed: 2703368087 | |
Found cached dataset json (/home/ubuntu/.cache/huggingface/datasets/toanbku___json/toanbku--oa-df-x1000-330ee3ae28b11a32/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96) | |
OASST HF dataset toanbku/oa-df-x1000: len(train)=1970, len(val)=330 | |
OASST HF dataset toanbku/oa-df-x1000: len(train)=1970, len(val)=330 | |
OASST HF dataset toanbku/oa-df-x1000: len(train)=1970, len(val)=330 | |
OASST HF dataset toanbku/oa-df-x1000: len(train)=1970, len(val)=330 | |
OASST HF dataset toanbku/oa-df-x1000: len(train)=1970, len(val)=330 | |
len self.tokenizer 50282 | |
[2023-07-21 03:19:25,472] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented | |
[2023-07-21 03:19:25,472] [INFO] [comm.py:616:init_distributed] cdb=None | |
len self.tokenizer 50282 | |
[2023-07-21 03:19:25,477] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented | |
[2023-07-21 03:19:25,477] [INFO] [comm.py:616:init_distributed] cdb=None | |
[2023-07-21 03:19:25,477] [INFO] [comm.py:643:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl | |
len self.tokenizer 50282 | |
[2023-07-21 03:19:25,539] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented | |
[2023-07-21 03:19:25,539] [INFO] [comm.py:616:init_distributed] cdb=None | |
len self.tokenizer 50282 | |
[2023-07-21 03:19:25,551] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented | |
[2023-07-21 03:19:25,552] [INFO] [comm.py:616:init_distributed] cdb=None | |
len self.tokenizer 50282 | |
[2023-07-21 03:19:25,567] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented | |
[2023-07-21 03:19:25,567] [INFO] [comm.py:616:init_distributed] cdb=None | |
OASST HF dataset toanbku/oa-df-x1000: len(train)=1970, len(val)=330 | |
len self.tokenizer 50282 | |
[2023-07-21 03:19:26,154] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented | |
[2023-07-21 03:19:26,154] [INFO] [comm.py:616:init_distributed] cdb=None | |
[RANK 0] Initializing model: toanbku/oa-pythia-12b-sft-df | |
Loading checkpoint shards: 100%|███████████████████████████████████████████████████| 3/3 [00:20<00:00, 6.87s/it] | |
Resizing embeddings to 50282 | |
Loading checkpoint shards: 100%|███████████████████████████████████████████████████| 3/3 [00:20<00:00, 6.94s/it] | |
Resizing embeddings to 50282 | |
Loading checkpoint shards: 100%|███████████████████████████████████████████████████| 3/3 [00:24<00:00, 8.06s/it] | |
Resizing embeddings to 50282 | |
Loading checkpoint shards: 100%|███████████████████████████████████████████████████| 3/3 [00:28<00:00, 9.44s/it] | |
Resizing embeddings to 50282 | |
Loading checkpoint shards: 100%|███████████████████████████████████████████████████| 3/3 [00:26<00:00, 8.74s/it] | |
Resizing embeddings to 50282 | |
Loading checkpoint shards: 100%|███████████████████████████████████████████████████| 3/3 [00:27<00:00, 9.26s/it] | |
Resizing embeddings to 50282 | |
Number of trainable parameters: 11841M | |
Number of trainable parameters: 11841M | |
Downloading (…)model.bin.index.json: 100%|███████████████████████████████████| 47.3k/47.3k [00:00<00:00, 128MB/s] | |
[2023-07-21 03:24:00,596] [WARNING] [cpu_adam.py:84:__init__] FP16 params for CPUAdam may not work on AMD CPUs | |
[2023-07-21 03:24:00,626] [WARNING] [cpu_adam.py:84:__init__] FP16 params for CPUAdam may not work on AMD CPUs | |
Using /home/ubuntu/.cache/torch_extensions/py310_cu118 as PyTorch extensions root... | |
Creating extension directory /home/ubuntu/.cache/torch_extensions/py310_cu118/cpu_adam... | |
Using /home/ubuntu/.cache/torch_extensions/py310_cu118 as PyTorch extensions root... | |
Detected CUDA files, patching ldflags | |
Emitting ninja build file /home/ubuntu/.cache/torch_extensions/py310_cu118/cpu_adam/build.ninja... | |
Building extension module cpu_adam... | |
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) | |
Number of trainable parameters: 11841M | |
Number of trainable parameters: 11841M | |
[2023-07-21 03:24:10,252] [WARNING] [cpu_adam.py:84:__init__] FP16 params for CPUAdam may not work on AMD CPUs | |
Number of trainable parameters: 11841M | |
[2023-07-21 03:24:11,981] [WARNING] [cpu_adam.py:84:__init__] FP16 params for CPUAdam may not work on AMD CPUs | |
Number of trainable parameters: 11841M | |
[1/3] /home/ubuntu/mambaforge/bin/nvcc -ccbin /home/ubuntu/mambaforge/bin/x86_64-conda-linux-gnu-cc -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/ubuntu/mambaforge/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -I/home/ubuntu/mambaforge/include -isystem /home/ubuntu/mambaforge/lib/python3.10/site-packages/torch/include -isystem /home/ubuntu/mambaforge/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/ubuntu/mambaforge/lib/python3.10/site-packages/torch/include/TH -isystem /home/ubuntu/mambaforge/lib/python3.10/site-packages/torch/include/THC -isystem /home/ubuntu/mambaforge/include -isystem /home/ubuntu/mambaforge/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -DBF16_AVAILABLE -c /home/ubuntu/mambaforge/lib/python3.10/site-packages/deepspeed/ops/csrc/common/custom_cuda_kernel.cu -o custom_cuda_kernel.cuda.o | |
[2023-07-21 03:24:12,718] [WARNING] [cpu_adam.py:84:__init__] FP16 params for CPUAdam may not work on AMD CPUs | |
Using /home/ubuntu/.cache/torch_extensions/py310_cu118 as PyTorch extensions root... | |
wandb: Currently logged in as: toanbku. Use `wandb login --relogin` to force relogin | |
Using /home/ubuntu/.cache/torch_extensions/py310_cu118 as PyTorch extensions root... | |
Using /home/ubuntu/.cache/torch_extensions/py310_cu118 as PyTorch extensions root... | |
wandb: Tracking run with wandb version 0.15.5 | |
wandb: Run data is saved locally in /home/ubuntu/OA/model/model_training/wandb/run-20230721_032413-7g1yavb6 | |
wandb: Run `wandb offline` to turn off syncing. | |
wandb: Syncing run trainer_rl/oa-pythia-12b-sft-df/6gpus:main | |
wandb: ⭐️ View project at https://wandb.ai/toanbku/rlhf | |
wandb: 🚀 View run at https://wandb.ai/toanbku/rlhf/runs/7g1yavb6 | |
[2023-07-21 03:24:21,849] [WARNING] [cpu_adam.py:84:__init__] FP16 params for CPUAdam may not work on AMD CPUs | |
Using /home/ubuntu/.cache/torch_extensions/py310_cu118 as PyTorch extensions root... | |
[2/3] /home/ubuntu/mambaforge/bin/x86_64-conda-linux-gnu-c++ -MMD -MF cpu_adam.o.d -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/ubuntu/mambaforge/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -I/home/ubuntu/mambaforge/include -isystem /home/ubuntu/mambaforge/lib/python3.10/site-packages/torch/include -isystem /home/ubuntu/mambaforge/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/ubuntu/mambaforge/lib/python3.10/site-packages/torch/include/TH -isystem /home/ubuntu/mambaforge/lib/python3.10/site-packages/torch/include/THC -isystem /home/ubuntu/mambaforge/include -isystem /home/ubuntu/mambaforge/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -L/home/ubuntu/mambaforge/lib64 -lcudart -lcublas -g -march=native -fopenmp -D__AVX256__ -D__ENABLE_CUDA__ -DBF16_AVAILABLE -c /home/ubuntu/mambaforge/lib/python3.10/site-packages/deepspeed/ops/csrc/adam/cpu_adam.cpp -o cpu_adam.o | |
[3/3] /home/ubuntu/mambaforge/bin/x86_64-conda-linux-gnu-c++ cpu_adam.o custom_cuda_kernel.cuda.o -shared -lcurand -L/home/ubuntu/mambaforge/lib/python3.10/site-packages/torch/lib -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -ltorch_python -L/home/ubuntu/mambaforge/lib64 -lcudart -o cpu_adam.so | |
Loading extension module cpu_adam... | |
Time to load cpu_adam op: 33.74172616004944 seconds | |
Loading extension module cpu_adam... | |
Time to load cpu_adam op: 22.379701375961304 seconds | |
Loading extension module cpu_adam... | |
Time to load cpu_adam op: 24.125978469848633 seconds | |
Loading extension module cpu_adam... | |
Loading extension module cpu_adam... | |
Time to load cpu_adam op: 33.770209074020386 seconds | |
Time to load cpu_adam op: 12.540481567382812 seconds | |
Loading extension module cpu_adam... | |
Time to load cpu_adam op: 21.731924295425415 seconds | |
Adam Optimizer #0 is created with AVX2 arithmetic capability. | |
Config: alpha=0.000001, betas=(0.900000, 0.999000), weight_decay=0.000001, adam_w=1 | |
Adam Optimizer #0 is created with AVX2 arithmetic capability. | |
Config: alpha=0.000001, betas=(0.900000, 0.999000), weight_decay=0.000001, adam_w=1 | |
[2023-07-21 03:24:35,325] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed info: version=0.10.0, git-hash=unknown, git-branch=unknown | |
Adam Optimizer #0 is created with AVX2 arithmetic capability. | |
Config: alpha=0.000001, betas=(0.900000, 0.999000), weight_decay=0.000001, adam_w=1 | |
Adam Optimizer #0 is created with AVX2 arithmetic capability. | |
Config: alpha=0.000001, betas=(0.900000, 0.999000), weight_decay=0.000001, adam_w=1 | |
Adam Optimizer #0 is created with AVX2 arithmetic capability. | |
Config: alpha=0.000001, betas=(0.900000, 0.999000), weight_decay=0.000001, adam_w=1 | |
Adam Optimizer #0 is created with AVX2 arithmetic capability. | |
Config: alpha=0.000001, betas=(0.900000, 0.999000), weight_decay=0.000001, adam_w=1 | |
[2023-07-21 03:24:51,078] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False | |
[2023-07-21 03:24:51,081] [INFO] [logging.py:96:log_dist] [Rank 0] Removing param_group that has no 'params' in the client Optimizer | |
[2023-07-21 03:24:51,081] [INFO] [logging.py:96:log_dist] [Rank 0] Using client Optimizer as basic optimizer | |
[2023-07-21 03:24:51,121] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Basic Optimizer = DeepSpeedCPUAdam | |
[2023-07-21 03:24:51,121] [INFO] [utils.py:54:is_zero_supported_optimizer] Checking ZeRO support for optimizer=DeepSpeedCPUAdam type=<class 'deepspeed.ops.adam.cpu_adam.DeepSpeedCPUAdam'> | |
[2023-07-21 03:24:51,122] [INFO] [logging.py:96:log_dist] [Rank 0] Creating torch.float16 ZeRO stage 2 optimizer | |
[2023-07-21 03:24:51,122] [INFO] [stage_1_and_2.py:133:__init__] Reduce bucket size 200000000 | |
[2023-07-21 03:24:51,122] [INFO] [stage_1_and_2.py:134:__init__] Allgather bucket size 200000000 | |
[2023-07-21 03:24:51,122] [INFO] [stage_1_and_2.py:135:__init__] CPU Offload: True | |
[2023-07-21 03:24:51,122] [INFO] [stage_1_and_2.py:136:__init__] Round robin gradient partitioning: False | |
Rank: 5 partition count [6] and sizes[(1982394028, False)] | |
Rank: 3 partition count [6] and sizes[(1982394028, False)] | |
Rank: 1 partition count [6] and sizes[(1982394028, False)] | |
Rank: 0 partition count [6] and sizes[(1982394028, False)] | |
Rank: 4 partition count [6] and sizes[(1982394028, False)] | |
Rank: 2 partition count [6] and sizes[(1982394028, False)] | |
[2023-07-21 03:25:29,511] [INFO] [utils.py:785:see_memory_usage] Before initializing optimizer states | |
[2023-07-21 03:25:29,514] [INFO] [utils.py:786:see_memory_usage] MA 22.78 GB Max_MA 22.78 GB CA 22.8 GB Max_CA 23 GB | |
[2023-07-21 03:25:29,514] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory: used = 83.13 GB, percent = 11.0% | |
[2023-07-21 03:25:46,648] [INFO] [utils.py:785:see_memory_usage] After initializing optimizer states | |
[2023-07-21 03:25:46,650] [INFO] [utils.py:786:see_memory_usage] MA 22.78 GB Max_MA 22.78 GB CA 22.8 GB Max_CA 23 GB | |
[2023-07-21 03:25:46,650] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory: used = 208.3 GB, percent = 27.6% | |
[2023-07-21 03:25:46,650] [INFO] [stage_1_and_2.py:493:__init__] optimizer state initialized | |
[2023-07-21 03:25:46,767] [INFO] [utils.py:785:see_memory_usage] After initializing ZeRO optimizer | |
[2023-07-21 03:25:46,769] [INFO] [utils.py:786:see_memory_usage] MA 22.78 GB Max_MA 22.78 GB CA 22.8 GB Max_CA 23 GB | |
[2023-07-21 03:25:46,769] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory: used = 208.34 GB, percent = 27.6% | |
[2023-07-21 03:25:46,772] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Final Optimizer = DeepSpeedCPUAdam | |
[2023-07-21 03:25:46,773] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed using client LR scheduler | |
[2023-07-21 03:25:46,773] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed LR Scheduler = None | |
[2023-07-21 03:25:46,773] [INFO] [logging.py:96:log_dist] [Rank 0] step=0, skipped=0, lr=[1e-06], mom=[[0.9, 0.95]] | |
[2023-07-21 03:25:46,774] [INFO] [config.py:960:print] DeepSpeedEngine configuration: | |
[2023-07-21 03:25:46,774] [INFO] [config.py:964:print] activation_checkpointing_config { | |
"partition_activations": false, | |
"contiguous_memory_optimization": false, | |
"cpu_checkpointing": false, | |
"number_checkpoints": null, | |
"synchronize_checkpoint_boundary": false, | |
"profile": false | |
} | |
[2023-07-21 03:25:46,774] [INFO] [config.py:964:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} | |
[2023-07-21 03:25:46,775] [INFO] [config.py:964:print] amp_enabled .................. False | |
[2023-07-21 03:25:46,775] [INFO] [config.py:964:print] amp_params ................... False | |
[2023-07-21 03:25:46,775] [INFO] [config.py:964:print] autotuning_config ............ { | |
"enabled": false, | |
"start_step": null, | |
"end_step": null, | |
"metric_path": null, | |
"arg_mappings": null, | |
"metric": "throughput", | |
"model_info": null, | |
"results_dir": "autotuning_results", | |
"exps_dir": "autotuning_exps", | |
"overwrite": true, | |
"fast": true, | |
"start_profile_step": 3, | |
"end_profile_step": 5, | |
"tuner_type": "gridsearch", | |
"tuner_early_stopping": 5, | |
"tuner_num_trials": 50, | |
"model_info_path": null, | |
"mp_size": 1, | |
"max_train_batch_size": null, | |
"min_train_batch_size": 1, | |
"max_train_micro_batch_size_per_gpu": 1.024000e+03, | |
"min_train_micro_batch_size_per_gpu": 1, | |
"num_tuning_micro_batch_sizes": 3 | |
} | |
[2023-07-21 03:25:46,775] [INFO] [config.py:964:print] bfloat16_enabled ............. False | |
[2023-07-21 03:25:46,775] [INFO] [config.py:964:print] checkpoint_parallel_write_pipeline False | |
[2023-07-21 03:25:46,775] [INFO] [config.py:964:print] checkpoint_tag_validation_enabled True | |
[2023-07-21 03:25:46,775] [INFO] [config.py:964:print] checkpoint_tag_validation_fail False | |
[2023-07-21 03:25:46,776] [INFO] [config.py:964:print] comms_config ................. <deepspeed.comm.config.DeepSpeedCommsConfig object at 0x7f9d7868f010> | |
[2023-07-21 03:25:46,776] [INFO] [config.py:964:print] communication_data_type ...... None | |
[2023-07-21 03:25:46,776] [INFO] [config.py:964:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}} | |
[2023-07-21 03:25:46,776] [INFO] [config.py:964:print] curriculum_enabled_legacy .... False | |
[2023-07-21 03:25:46,776] [INFO] [config.py:964:print] curriculum_params_legacy ..... False | |
[2023-07-21 03:25:46,776] [INFO] [config.py:964:print] data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}} | |
[2023-07-21 03:25:46,776] [INFO] [config.py:964:print] data_efficiency_enabled ...... False | |
[2023-07-21 03:25:46,776] [INFO] [config.py:964:print] dataloader_drop_last ......... False | |
[2023-07-21 03:25:46,776] [INFO] [config.py:964:print] disable_allgather ............ False | |
[2023-07-21 03:25:46,776] [INFO] [config.py:964:print] dump_state ................... False | |
[2023-07-21 03:25:46,776] [INFO] [config.py:964:print] dynamic_loss_scale_args ...... {'init_scale': 4096, 'scale_window': 1000, 'delayed_shift': 2, 'consecutive_hysteresis': False, 'min_scale': 1} | |
[2023-07-21 03:25:46,776] [INFO] [config.py:964:print] eigenvalue_enabled ........... False | |
[2023-07-21 03:25:46,776] [INFO] [config.py:964:print] eigenvalue_gas_boundary_resolution 1 | |
[2023-07-21 03:25:46,776] [INFO] [config.py:964:print] eigenvalue_layer_name ........ bert.encoder.layer | |
[2023-07-21 03:25:46,777] [INFO] [config.py:964:print] eigenvalue_layer_num ......... 0 | |
[2023-07-21 03:25:46,777] [INFO] [config.py:964:print] eigenvalue_max_iter .......... 100 | |
[2023-07-21 03:25:46,777] [INFO] [config.py:964:print] eigenvalue_stability ......... 1e-06 | |
[2023-07-21 03:25:46,777] [INFO] [config.py:964:print] eigenvalue_tol ............... 0.01 | |
[2023-07-21 03:25:46,777] [INFO] [config.py:964:print] eigenvalue_verbose ........... False | |
[2023-07-21 03:25:46,777] [INFO] [config.py:964:print] elasticity_enabled ........... False | |
[2023-07-21 03:25:46,777] [INFO] [config.py:964:print] flops_profiler_config ........ { | |
"enabled": false, | |
"recompute_fwd_factor": 0.0, | |
"profile_step": 1, | |
"module_depth": -1, | |
"top_modules": 1, | |
"detailed": true, | |
"output_file": null | |
} | |
[2023-07-21 03:25:46,777] [INFO] [config.py:964:print] fp16_auto_cast ............... False | |
[2023-07-21 03:25:46,777] [INFO] [config.py:964:print] fp16_enabled ................. true | |
[2023-07-21 03:25:46,777] [INFO] [config.py:964:print] fp16_master_weights_and_gradients False | |
[2023-07-21 03:25:46,777] [INFO] [config.py:964:print] global_rank .................. 0 | |
[2023-07-21 03:25:46,777] [INFO] [config.py:964:print] grad_accum_dtype ............. None | |
[2023-07-21 03:25:46,777] [INFO] [config.py:964:print] gradient_accumulation_steps .. 1 | |
[2023-07-21 03:25:46,778] [INFO] [config.py:964:print] gradient_clipping ............ 1.0 | |
[2023-07-21 03:25:46,778] [INFO] [config.py:964:print] gradient_predivide_factor .... 1.0 | |
[2023-07-21 03:25:46,778] [INFO] [config.py:964:print] hybrid_engine ................ enabled=False max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8 | |
[2023-07-21 03:25:46,778] [INFO] [config.py:964:print] initial_dynamic_scale ........ 4096 | |
[2023-07-21 03:25:46,778] [INFO] [config.py:964:print] load_universal_checkpoint .... False | |
[2023-07-21 03:25:46,778] [INFO] [config.py:964:print] loss_scale ................... 0 | |
[2023-07-21 03:25:46,778] [INFO] [config.py:964:print] memory_breakdown ............. False | |
[2023-07-21 03:25:46,778] [INFO] [config.py:964:print] mics_hierarchial_params_gather False | |
[2023-07-21 03:25:46,778] [INFO] [config.py:964:print] mics_shard_size .............. -1 | |
[2023-07-21 03:25:46,778] [INFO] [config.py:964:print] monitor_config ............... tensorboard=TensorBoardConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') enabled=False | |
[2023-07-21 03:25:46,779] [INFO] [config.py:964:print] nebula_config ................ { | |
"enabled": false, | |
"persistent_storage_path": null, | |
"persistent_time_interval": 100, | |
"num_of_version_in_retention": 2, | |
"enable_nebula_load": true, | |
"load_path": null | |
} | |
[2023-07-21 03:25:46,779] [INFO] [config.py:964:print] optimizer_legacy_fusion ...... False | |
[2023-07-21 03:25:46,779] [INFO] [config.py:964:print] optimizer_name ............... None | |
[2023-07-21 03:25:46,779] [INFO] [config.py:964:print] optimizer_params ............. None | |
[2023-07-21 03:25:46,779] [INFO] [config.py:964:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} | |
[2023-07-21 03:25:46,779] [INFO] [config.py:964:print] pld_enabled .................. False | |
[2023-07-21 03:25:46,779] [INFO] [config.py:964:print] pld_params ................... False | |
[2023-07-21 03:25:46,779] [INFO] [config.py:964:print] prescale_gradients ........... False | |
[2023-07-21 03:25:46,779] [INFO] [config.py:964:print] scheduler_name ............... None | |
[2023-07-21 03:25:46,779] [INFO] [config.py:964:print] scheduler_params ............. None | |
[2023-07-21 03:25:46,779] [INFO] [config.py:964:print] sparse_attention ............. None | |
[2023-07-21 03:25:46,779] [INFO] [config.py:964:print] sparse_gradients_enabled ..... False | |
[2023-07-21 03:25:46,779] [INFO] [config.py:964:print] steps_per_print .............. inf | |
[2023-07-21 03:25:46,779] [INFO] [config.py:964:print] train_batch_size ............. 6 | |
[2023-07-21 03:25:46,780] [INFO] [config.py:964:print] train_micro_batch_size_per_gpu 1 | |
[2023-07-21 03:25:46,780] [INFO] [config.py:964:print] use_node_local_storage ....... False | |
[2023-07-21 03:25:46,780] [INFO] [config.py:964:print] wall_clock_breakdown ......... False | |
[2023-07-21 03:25:46,780] [INFO] [config.py:964:print] world_size ................... 6 | |
[2023-07-21 03:25:46,780] [INFO] [config.py:964:print] zero_allow_untested_optimizer True | |
[2023-07-21 03:25:46,780] [INFO] [config.py:964:print] zero_config .................. stage=2 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=200000000 allgather_partitions=True allgather_bucket_size=200000000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=None offload_optimizer=DeepSpeedZeroOffloadOptimizerConfig(device='cpu', nvme_path=None, buffer_count=4, pin_memory=True, pipeline=False, pipeline_read=False, pipeline_write=False, fast_init=False) sub_group_size=1,000,000,000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50,000,000 param_persistence_threshold=100,000 model_persistence_threshold=sys.maxsize max_live_parameters=1,000,000,000 max_reuse_distance=1,000,000,000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False zero_hpz_partition_size=1 zero_quantized_weights=False zero_quantized_gradients=False mics_shard_size=-1 mics_hierarchical_params_gather=False memory_efficient_linear=True | |
[2023-07-21 03:25:46,780] [INFO] [config.py:964:print] zero_enabled ................. True | |
[2023-07-21 03:25:46,780] [INFO] [config.py:964:print] zero_force_ds_cpu_optimizer .. True | |
[2023-07-21 03:25:46,780] [INFO] [config.py:964:print] zero_optimization_stage ...... 2 | |
[2023-07-21 03:25:46,780] [INFO] [config.py:950:print_user_config] json = { | |
"fp16": { | |
"enabled": "true", | |
"loss_scale": 0, | |
"loss_scale_window": 1000, | |
"initial_scale_power": 12, | |
"hysteresis": 2, | |
"min_loss_scale": 1, | |
"auto_cast": false | |
}, | |
"zero_optimization": { | |
"stage": 2, | |
"allgather_partitions": true, | |
"allgather_bucket_size": 2.000000e+08, | |
"overlap_comm": false, | |
"reduce_scatter": true, | |
"reduce_bucket_size": 2.000000e+08, | |
"contiguous_gradients": true, | |
"offload_optimizer": { | |
"device": "cpu", | |
"pin_memory": true | |
} | |
}, | |
"gradient_accumulation_steps": 1, | |
"train_micro_batch_size_per_gpu": 1, | |
"train_batch_size": 6, | |
"gradient_clipping": 1.0, | |
"steps_per_print": inf, | |
"wall_clock_breakdown": false, | |
"bf16": { | |
"enabled": false | |
}, | |
"zero_allow_untested_optimizer": true | |
} | |
Loading checkpoint shards: 100%|███████████████████████████████████████████████████| 3/3 [00:18<00:00, 6.02s/it] | |
Resizing embeddings to 50282 | |
Loading checkpoint shards: 100%|███████████████████████████████████████████████████| 3/3 [00:22<00:00, 7.37s/it] | |
Resizing embeddings to 50282 | |
Loading checkpoint shards: 100%|███████████████████████████████████████████████████| 3/3 [00:21<00:00, 7.24s/it] | |
Resizing embeddings to 50282 | |
Loading checkpoint shards: 100%|███████████████████████████████████████████████████| 3/3 [00:21<00:00, 7.16s/it] | |
Resizing embeddings to 50282 | |
Loading checkpoint shards: 100%|███████████████████████████████████████████████████| 3/3 [00:22<00:00, 7.46s/it] | |
Resizing embeddings to 50282 | |
Number of trainable parameters: 11841M | |
Loading checkpoint shards: 33%|█████████████████ | 1/3 [00:09<00:19, 9.60s/it]Number of trainable parameters: 11841M | |
Number of trainable parameters: 11841M | |
Loading checkpoint shards: 100%|███████████████████████████████████████████████████| 3/3 [00:22<00:00, 7.66s/it] | |
Resizing embeddings to 50282 | |
Number of trainable parameters: 11841M | |
Number of trainable parameters: 11841M | |
[RANK 0] Starting training | |
[rollout 0 / 16]: 0%| | 0/16 [00:00<?, ?it/s]You're using a GPTNeoXTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding. | |
You're using a GPTNeoXTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding. | |
Number of trainable parameters: 11841M | |
You're using a GPTNeoXTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding. | |
You're using a GPTNeoXTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding. | |
You're using a GPTNeoXTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding. | |
You're using a GPTNeoXTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding. | |
[rollout 16 / 16]: 100%|█████████████████████████████████████████████████████████| 16/16 [05:07<00:00, 19.22s/it] | |
[RANK 0] Evaluating model | |
[generation sweep 1/1 | eval batch 11/11]: 100%|██████████████████████████████████████████████| 11/11 [00:59<00:00, 5.40s/it] | |
[RANK 0] Computing rewards | |
Traceback (most recent call last): | |
File "/home/ubuntu/OA/model/model_training/trainer_rl.py", line 199, in <module> | |
main() | |
File "/home/ubuntu/OA/model/model_training/trainer_rl.py", line 184, in main | |
trainer = trlx.train( | |
File "/home/ubuntu/mambaforge/lib/python3.10/site-packages/trlx/trlx.py", line 126, in train | |
trainer.learn() | |
File "/home/ubuntu/mambaforge/lib/python3.10/site-packages/trlx/trainer/accelerate_base_trainer.py", line 539, in learn | |
results = self.evaluate() | |
File "/home/ubuntu/mambaforge/lib/python3.10/site-packages/trlx/trainer/accelerate_base_trainer.py", line 430, in evaluate | |
rewards = self.reward_fn( | |
TypeError: create_reward_fn.<locals>.reward_fn() got an unexpected keyword argument 'tokenizer' | |
Traceback (most recent call last): | |
File "/home/ubuntu/OA/model/model_training/trainer_rl.py", line 199, in <module> | |
main() | |
File "/home/ubuntu/OA/model/model_training/trainer_rl.py", line 184, in main | |
trainer = trlx.train( | |
File "/home/ubuntu/mambaforge/lib/python3.10/site-packages/trlx/trlx.py", line 126, in train | |
trainer.learn() | |
File "/home/ubuntu/mambaforge/lib/python3.10/site-packages/trlx/trainer/accelerate_base_trainer.py", line 539, in learn | |
results = self.evaluate() | |
File "/home/ubuntu/mambaforge/lib/python3.10/site-packages/trlx/trainer/accelerate_base_trainer.py", line 430, in evaluate | |
rewards = self.reward_fn( | |
TypeError: create_reward_fn.<locals>.reward_fn() got an unexpected keyword argument 'tokenizer' | |
wandb: Waiting for W&B process to finish... (failed 1). Press Control-C to abort syncing. | |
wandb: | 0.023 MB of 0.023 MB uploaded (0.000 MB deduped) | |
wandb: Run history: | |
wandb: exp_scores/mean ▁ | |
wandb: exp_scores/running_mean ▁ | |
wandb: exp_scores/running_std ▁ | |
wandb: exp_scores/std ▁ | |
wandb: kl_ctl_value ▁ | |
wandb: policy/sqrt_kl ▁ | |
wandb: time/exp ▁ | |
wandb: time/exp_generate ▁ | |
wandb: time/exp_score ▁ | |
wandb: | |
wandb: Run summary: | |
wandb: exp_scores/mean 4.85738 | |
wandb: exp_scores/running_mean 3.56623 | |
wandb: exp_scores/running_std 4.1059 | |
wandb: exp_scores/std 5.84514 | |
wandb: kl_ctl_value 0.1 | |
wandb: policy/sqrt_kl 0.01649 | |
wandb: time/exp 25.99541 | |
wandb: time/exp_generate 23.5927 | |
wandb: time/exp_score 0.55094 | |
wandb: | |
wandb: 🚀 View run trainer_rl/oa-pythia-12b-sft-df/6gpus:main at: https://wandb.ai/toanbku/rlhf/runs/7g1yavb6 | |
wandb: Synced 6 W&B file(s), 0 media file(s), 2 artifact file(s) and 0 other file(s) | |
wandb: Find logs at: ./wandb/run-20230721_032413-7g1yavb6/logs | |
Exception in thread NetStatThr: | |
Traceback (most recent call last): | |
File "/home/ubuntu/mambaforge/lib/python3.10/threading.py", line 1016, in _bootstrap_inner | |
self.run() | |
File "/home/ubuntu/mambaforge/lib/python3.10/threading.py", line 953, in run | |
self._target(*self._args, **self._kwargs) | |
File "/home/ubuntu/mambaforge/lib/python3.10/site-packages/wandb/sdk/wandb_run.py", line 255, in check_network_status | |
self._loop_check_status( | |
File "/home/ubuntu/mambaforge/lib/python3.10/site-packages/wandb/sdk/wandb_run.py", line 211, in _loop_check_status | |
local_handle = request() | |
File "/home/ubuntu/mambaforge/lib/python3.10/site-packages/wandb/sdk/interface/interface.py", line 795, in deliver_network_status | |
return self._deliver_network_status(status) | |
File "/home/ubuntu/mambaforge/lib/python3.10/site-packages/wandb/sdk/interface/interface_shared.py", line 601, in _deliver_network_status | |
return self._deliver_record(record) | |
File "/home/ubuntu/mambaforge/lib/python3.10/site-packages/wandb/sdk/interface/interface_shared.py", line 560, in _deliver_record | |
handle = mailbox._deliver_record(record, interface=self) | |
File "/home/ubuntu/mambaforge/lib/python3.10/site-packages/wandb/sdk/lib/mailbox.py", line 455, in _deliver_record | |
interface._publish(record) | |
File "/home/ubuntu/mambaforge/lib/python3.10/site-packages/wandb/sdk/interface/interface_sock.py", line 51, in _publish | |
self._sock_client.send_record_publish(record) | |
File "/home/ubuntu/mambaforge/lib/python3.10/site-packages/wandb/sdk/lib/sock_client.py", line 221, in send_record_publish | |
self.send_server_request(server_req) | |
File "/home/ubuntu/mambaforge/lib/python3.10/site-packages/wandb/sdk/lib/sock_client.py", line 155, in send_server_request | |
self._send_message(msg) | |
File "/home/ubuntu/mambaforge/lib/python3.10/site-packages/wandb/sdk/lib/sock_client.py", line 152, in _send_message | |
self._sendall_with_error_handle(header + data) | |
File "/home/ubuntu/mambaforge/lib/python3.10/site-packages/wandb/sdk/lib/sock_client.py", line 130, in _sendall_with_error_handle | |
sent = self._sock.send(data) | |
BrokenPipeError: [Errno 32] Broken pipe | |
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 10083 closing signal SIGTERM | |
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 10084 closing signal SIGTERM | |
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 10085 closing signal SIGTERM | |
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 10086 closing signal SIGTERM | |
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 10087 closing signal SIGTERM | |
WARNING:torch.distributed.elastic.multiprocessing.api:Unable to shutdown process 10083 via 15, forcefully exiting via 9 | |
WARNING:torch.distributed.elastic.multiprocessing.api:Unable to shutdown process 10084 via 15, forcefully exiting via 9 | |
WARNING:torch.distributed.elastic.multiprocessing.api:Unable to shutdown process 10085 via 15, forcefully exiting via 9 | |
WARNING:torch.distributed.elastic.multiprocessing.api:Unable to shutdown process 10086 via 15, forcefully exiting via 9 | |
WARNING:torch.distributed.elastic.multiprocessing.api:Unable to shutdown process 10087 via 15, forcefully exiting via 9 | |
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 10082) of binary: /home/ubuntu/mambaforge/bin/python3.10 | |
Traceback (most recent call last): | |
File "/home/ubuntu/mambaforge/bin/accelerate", line 8, in <module> | |
sys.exit(main()) | |
File "/home/ubuntu/mambaforge/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 45, in main | |
args.func(args) | |
File "/home/ubuntu/mambaforge/lib/python3.10/site-packages/accelerate/commands/launch.py", line 964, in launch_command | |
deepspeed_launcher(args) | |
File "/home/ubuntu/mambaforge/lib/python3.10/site-packages/accelerate/commands/launch.py", line 687, in deepspeed_launcher | |
distrib_run.run(args) | |
File "/home/ubuntu/mambaforge/lib/python3.10/site-packages/torch/distributed/run.py", line 785, in run | |
elastic_launch( | |
File "/home/ubuntu/mambaforge/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 134, in __call__ | |
return launch_agent(self._config, self._entrypoint, list(args)) | |
File "/home/ubuntu/mambaforge/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent | |
raise ChildFailedError( | |
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: | |
============================================================ | |
trainer_rl.py FAILED | |
------------------------------------------------------------ | |
Failures: | |
<NO_OTHER_FAILURES> | |
------------------------------------------------------------ | |
Root Cause (first observed failure): | |
[0]: | |
time : 2023-07-21_03:37:13 | |
host : oa-server-8 | |
rank : 0 (local_rank: 0) | |
exitcode : 1 (pid: 10082) | |
error_file: <N/A> | |
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment