Created
July 12, 2023 16:10
-
-
Save anijain2305/1f69533eefcce69719f5805e733a8586 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
******* loading model args.model='t5' | |
--> World Size = 1 | |
--> Device_count = 2 | |
--> running with these defaults train_config(seed=2023, verbose=True, total_steps_to_run=8, warmup_steps=5, use_orig_params=True, limit_all_gathers=True, use_ddp=False, ddp_bucket_size=25, ddp_use_gradient_view=False, hf_t5_checkpointing=False, print_memory_summary=False, print_training_loss_data=False, num_epochs=4, model_weights_bf16=False, use_mixed_precision=True, use_low_precision_gradient_policy=False, use_tf32=True, optimizer='AdamW', ap_use_kahan_summation=False, sharding_strategy=<ShardingStrategy.FULL_SHARD: 1>, print_sharding_plan=False, run_profiler=False, profile_folder='fsdp/profile_tracing', log_every=1, num_workers_dataloader=2, batch_size_training=16, fsdp_activation_checkpointing=True, use_fused_attention=False, use_parallel_attention=False, run_validation=True, memory_report=True, nccl_debug_handler=True, distributed_debug=True, use_non_recursive_wrapping=False, use_synthetic_data=False, use_deferred_init=False, use_torch_compile=True, save_model_checkpoint=False, load_model_checkpoint=False, checkpoint_max_save_count=2, save_optimizer=False, load_optimizer=False, optimizer_checkpoint_file='Adam-t5--1.pt', checkpoint_model_filename='t5--1.pt') | |
clearing gpu cache for all ranks | |
--> running with torch dist debug set to detail | |
--> total memory per gpu (GB) = 39.564 | |
wrapping policy is functools.partial(<function transformer_auto_wrap_policy at 0x7f862c480ca0>, transformer_layer_cls={<class 'transformers.models.t5.modeling_t5.T5Block'>}) | |
pokemon nor beans set not enabled | |
Found cached dataset csv (/data/home/anijain/.cache/huggingface/datasets/csv/default-6c28f355c35f3029/0.0.0/eea64c71ca8b46dd3f537ed218fc9bf495d5707789152eb2764f5c78fa66d59d) | |
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 306.56it/s] | |
Found cached dataset csv (/data/home/anijain/.cache/huggingface/datasets/csv/default-6c28f355c35f3029/0.0.0/eea64c71ca8b46dd3f537ed218fc9bf495d5707789152eb2764f5c78fa66d59d) | |
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 481.55it/s] | |
--> Prepping t5-small model ... | |
stats is ready....? _stats=defaultdict(<class 'list'>, {'best_accuracy': 0.0}), local_rank=0, rank=0 | |
***** building the model ****** | |
using deferred? False | |
vit, GPU peak memory allocation: 0.0GB, GPU peak memory reserved: 0.0GB, GPU peak memory active: 0.0GB | |
--> t5-small built. | |
built model with 60.506624M params | |
bf16 check passed | |
--> Running with mixed precision MixedPrecision(param_dtype=torch.bfloat16, reduce_dtype=torch.bfloat16, buffer_dtype=torch.bfloat16, keep_low_precision_grads=False, cast_forward_inputs=False, cast_root_forward_inputs=True, _module_classes_to_ignore=(<class 'torch.nn.modules.batchnorm._BatchNorm'>,)) policy | |
backward prefetch set to BackwardPrefetch.BACKWARD_PRE | |
sharding set to ShardingStrategy.FULL_SHARD | |
--> Batch Size = 16 | |
vit, GPU peak memory allocation: 0.0GB, GPU peak memory reserved: 0.0GB, GPU peak memory active: 0.0GB | |
--> FSDP activation checkpointing in use | |
--> Torch.compile in use | |
local rank 0 init time = 2.1330501430202276 | |
memory stats reset, ready to track | |
Running with AdamW optimizer, with fusion set to True | |
Epoch: 0 starting... | |
r0 Training Epoch: 0%| | 0/814 [00:00<?, ?it/s][rank0]:[2023-07-12 16:06:38,552] torch._inductor.utils: [WARNING] using triton random, expect difference from eager | |
[rank0]:[2023-07-12 16:06:45,098] torch._inductor.utils: [WARNING] using triton random, expect difference from eager | |
[rank0]:[2023-07-12 16:06:50,665] torch._inductor.utils: [WARNING] using triton random, expect difference from eager | |
[rank0]:[2023-07-12 16:06:52,371] torch._inductor.utils: [WARNING] using triton random, expect difference from eager | |
[rank0]:[2023-07-12 16:06:53,787] torch._inductor.utils: [WARNING] using triton random, expect difference from eager | |
[rank0]:[2023-07-12 16:06:55,185] torch._inductor.utils: [WARNING] using triton random, expect difference from eager | |
[rank0]:[2023-07-12 16:07:03,010] torch._inductor.utils: [WARNING] using triton random, expect difference from eager | |
[rank0]:[2023-07-12 16:07:06,595] torch._inductor.utils: [WARNING] using triton random, expect difference from eager | |
[rank0]:[2023-07-12 16:07:09,035] torch._inductor.utils: [WARNING] using triton random, expect difference from eager | |
[rank0]:[2023-07-12 16:07:11,628] torch._inductor.utils: [WARNING] using triton random, expect difference from eager | |
[rank0]:[2023-07-12 16:07:13,970] torch._inductor.utils: [WARNING] using triton random, expect difference from eager | |
[rank0]:[2023-07-12 16:07:16,581] torch._inductor.utils: [WARNING] using triton random, expect difference from eager | |
r0 Training Epoch: 0%|▎ | 1/814 [01:00<13:33:31, 60.04s/it]step: 1: time taken for the last 1 steps is 0.08296773280017078, loss is 10.0 | |
r0 Training Epoch: 0%|█ | 3/814 [01:00<3:31:17, 15.63s/it]step: 2: time taken for the last 1 steps is 0.09435613313689828, loss is 12.375 | |
step: 3: time taken for the last 1 steps is 0.09597105905413628, loss is 12.4375 | |
r0 Training Epoch: 1%|█▊ | 5/814 [01:00<1:43:04, 7.64s/it]step: 4: time taken for the last 1 steps is 0.09714369312860072, loss is 11.0625 | |
step: 5: time taken for the last 1 steps is 0.09657704387791455, loss is 11.125 | |
r0 Training Epoch: 1%|██▌ | 7/814 [01:00<59:49, 4.45s/it]step: 6: time taken for the last 1 steps is 0.0961470000911504, loss is 9.875 | |
step: 7: time taken for the last 1 steps is 0.096938586095348, loss is 9.0625 | |
r0 Training Epoch: 1%|██▊ | 8/814 [01:00<1:42:02, 7.60s/it] | |
tracking_duration [60.03929250803776, 0.08296773280017078, 0.09435613313689828, 0.09597105905413628, 0.09714369312860072, 0.09657704387791455, 0.0961470000911504, 0.096938586095348] | |
** exit loop - rank 0 reporting.... | |
--> cuda max reserved memory = 4.7949 | |
--> max reserved percentage = 12.12 % | |
--> cuda max memory allocated = 3.7062 | |
--> max allocated percentage = 9.37 % | |
--> peak active memory = 3.7375 | |
--> peak active memory 9.45 % | |
cudaMalloc retries = 0 | |
cuda OOM = 0 | |
Validation loss data | |
Accuracy validation | |
--> Highest Val Accuracy = 0 | |
--> Step avg speed (in seconds) based on -5 steps: -0.0 | |
excluding 5 steps as warmup | |
--> Step avg speed based on 3 steps: 0.0966 seconds | |
Dist Training Framework used = FSDP | |
This was run with TensorParallel? = False | |
Run with Parallel Attention? False | |
Batch size used = 16 | |
FSDP Activation Checkpointing? = True | |
--> Model Size = 60.506624 M Params |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment