Skip to content

Instantly share code, notes, and snippets.

@HamidShojanazeri
Created March 15, 2022 17:53
Show Gist options
  • Save HamidShojanazeri/ce87ffca399cece4b74c11f25945fa68 to your computer and use it in GitHub Desktop.
Save HamidShojanazeri/ce87ffca399cece4b74c11f25945fa68 to your computer and use it in GitHub Desktop.
(PT-nightly-FSDP) ubuntu@ip-172-31-37-85:~/serve/examples/Huggingface_Transformers$ curl -X POST http://127.0.0.1:8080/predictions/Textgeneration -T Seq_classification_artifacts/sample_text_captum_input.txt
Bloomberg has decided to publish a new report on the global economy.From(PT-nightly-FSDP)
(PT-nightly-FSDP) ubuntu@ip-172-31-37-85:~/serve/examples/Huggingface_Transformers$ WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
2022-03-15T17:48:09,043 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager - Loading snapshot serializer plugin...
2022-03-15T17:48:09,098 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager - Initializing plugins manager...
2022-03-15T17:48:09,233 [INFO ] main org.pytorch.serve.ModelServer -
Torchserve version: 0.5.3
TS Home: /home/ubuntu/anaconda3/envs/PT-nightly-FSDP/lib/python3.8/site-packages
Current directory: /home/ubuntu/serve/examples/Huggingface_Transformers
Temp directory: /tmp
Number of GPUs: 4
Number of CPUs: 48
Max heap size: 30688 M
Python executable: /home/ubuntu/anaconda3/envs/PT-nightly-FSDP/bin/python3.8
Config file: N/A
Inference address: http://127.0.0.1:8080
Management address: http://127.0.0.1:8081
Metrics address: http://127.0.0.1:8082
Model Store: /home/ubuntu/serve/examples/Huggingface_Transformers/model_store
Initial Models: N/A
Log dir: /home/ubuntu/serve/examples/Huggingface_Transformers/logs
Metrics dir: /home/ubuntu/serve/examples/Huggingface_Transformers/logs
Netty threads: 0
Netty client threads: 0
Default workers per model: 4
Blacklist Regex: N/A
Maximum Response Size: 6553500
Maximum Request Size: 6553500
Limit Maximum Image Pixels: true
Prefer direct buffer: false
Allowed Urls: [file://.*|http(s)?://.*]
Custom python dependency for model allowed: false
Metrics report format: prometheus
Enable metrics API: true
Workflow Store: /home/ubuntu/serve/examples/Huggingface_Transformers/model_store
Model config: N/A
2022-03-15T17:48:09,242 [INFO ] main org.pytorch.serve.ModelServer - Initialize Inference server with: EpollServerSocketChannel.
2022-03-15T17:48:09,297 [INFO ] main org.pytorch.serve.ModelServer - Inference API bind to: http://127.0.0.1:8080
2022-03-15T17:48:09,298 [INFO ] main org.pytorch.serve.ModelServer - Initialize Management server with: EpollServerSocketChannel.
2022-03-15T17:48:09,298 [INFO ] main org.pytorch.serve.ModelServer - Management API bind to: http://127.0.0.1:8081
2022-03-15T17:48:09,299 [INFO ] main org.pytorch.serve.ModelServer - Initialize Metrics server with: EpollServerSocketChannel.
2022-03-15T17:48:09,299 [INFO ] main org.pytorch.serve.ModelServer - Metrics API bind to: http://127.0.0.1:8082
Model server started.
2022-03-15T17:48:10,157 [INFO ] pool-3-thread-1 TS_METRICS - CPUUtilization.Percent:0.0|#Level:Host|#hostname:ip-172-31-37-85,timestamp:1647366490
2022-03-15T17:48:10,158 [INFO ] pool-3-thread-1 TS_METRICS - DiskAvailable.Gigabytes:247.67612075805664|#Level:Host|#hostname:ip-172-31-37-85,timestamp:1647366490
2022-03-15T17:48:10,158 [INFO ] pool-3-thread-1 TS_METRICS - DiskUsage.Gigabytes:140.0098114013672|#Level:Host|#hostname:ip-172-31-37-85,timestamp:1647366490
2022-03-15T17:48:10,158 [INFO ] pool-3-thread-1 TS_METRICS - DiskUtilization.Percent:36.1|#Level:Host|#hostname:ip-172-31-37-85,timestamp:1647366490
2022-03-15T17:48:10,159 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUtilization.Percent:0.0|#Level:Host,device_id:0|#hostname:ip-172-31-37-85,timestamp:1647366490
2022-03-15T17:48:10,159 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUsed.Megabytes:0|#Level:Host,device_id:0|#hostname:ip-172-31-37-85,timestamp:1647366490
2022-03-15T17:48:10,159 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUtilization.Percent:0.0|#Level:Host,device_id:1|#hostname:ip-172-31-37-85,timestamp:1647366490
2022-03-15T17:48:10,159 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUsed.Megabytes:0|#Level:Host,device_id:1|#hostname:ip-172-31-37-85,timestamp:1647366490
2022-03-15T17:48:10,159 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUtilization.Percent:0.0|#Level:Host,device_id:2|#hostname:ip-172-31-37-85,timestamp:1647366490
2022-03-15T17:48:10,160 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUsed.Megabytes:0|#Level:Host,device_id:2|#hostname:ip-172-31-37-85,timestamp:1647366490
2022-03-15T17:48:10,160 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUtilization.Percent:0.0|#Level:Host,device_id:3|#hostname:ip-172-31-37-85,timestamp:1647366490
2022-03-15T17:48:10,160 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUsed.Megabytes:0|#Level:Host,device_id:3|#hostname:ip-172-31-37-85,timestamp:1647366490
2022-03-15T17:48:10,160 [INFO ] pool-3-thread-1 TS_METRICS - GPUUtilization.Percent:0|#Level:Host,device_id:0|#hostname:ip-172-31-37-85,timestamp:1647366490
2022-03-15T17:48:10,160 [INFO ] pool-3-thread-1 TS_METRICS - GPUUtilization.Percent:0|#Level:Host,device_id:1|#hostname:ip-172-31-37-85,timestamp:1647366490
2022-03-15T17:48:10,160 [INFO ] pool-3-thread-1 TS_METRICS - GPUUtilization.Percent:0|#Level:Host,device_id:2|#hostname:ip-172-31-37-85,timestamp:1647366490
2022-03-15T17:48:10,160 [INFO ] pool-3-thread-1 TS_METRICS - GPUUtilization.Percent:0|#Level:Host,device_id:3|#hostname:ip-172-31-37-85,timestamp:1647366490
2022-03-15T17:48:10,161 [INFO ] pool-3-thread-1 TS_METRICS - MemoryAvailable.Megabytes:187895.44921875|#Level:Host|#hostname:ip-172-31-37-85,timestamp:1647366490
2022-03-15T17:48:10,161 [INFO ] pool-3-thread-1 TS_METRICS - MemoryUsed.Megabytes:1493.96875|#Level:Host|#hostname:ip-172-31-37-85,timestamp:1647366490
2022-03-15T17:48:10,161 [INFO ] pool-3-thread-1 TS_METRICS - MemoryUtilization.Percent:1.7|#Level:Host|#hostname:ip-172-31-37-85,timestamp:1647366490
2022-03-15T17:49:10,126 [INFO ] pool-3-thread-1 TS_METRICS - CPUUtilization.Percent:16.7|#Level:Host|#hostname:ip-172-31-37-85,timestamp:1647366550
2022-03-15T17:49:10,127 [INFO ] pool-3-thread-1 TS_METRICS - DiskAvailable.Gigabytes:247.6761131286621|#Level:Host|#hostname:ip-172-31-37-85,timestamp:1647366550
2022-03-15T17:49:10,127 [INFO ] pool-3-thread-1 TS_METRICS - DiskUsage.Gigabytes:140.00981903076172|#Level:Host|#hostname:ip-172-31-37-85,timestamp:1647366550
2022-03-15T17:49:10,127 [INFO ] pool-3-thread-1 TS_METRICS - DiskUtilization.Percent:36.1|#Level:Host|#hostname:ip-172-31-37-85,timestamp:1647366550
2022-03-15T17:49:10,127 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUtilization.Percent:0.0|#Level:Host,device_id:0|#hostname:ip-172-31-37-85,timestamp:1647366550
2022-03-15T17:49:10,127 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUsed.Megabytes:0|#Level:Host,device_id:0|#hostname:ip-172-31-37-85,timestamp:1647366550
2022-03-15T17:49:10,127 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUtilization.Percent:0.0|#Level:Host,device_id:1|#hostname:ip-172-31-37-85,timestamp:1647366550
2022-03-15T17:49:10,127 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUsed.Megabytes:0|#Level:Host,device_id:1|#hostname:ip-172-31-37-85,timestamp:1647366550
2022-03-15T17:49:10,128 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUtilization.Percent:0.0|#Level:Host,device_id:2|#hostname:ip-172-31-37-85,timestamp:1647366550
2022-03-15T17:49:10,128 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUsed.Megabytes:0|#Level:Host,device_id:2|#hostname:ip-172-31-37-85,timestamp:1647366550
2022-03-15T17:49:10,128 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUtilization.Percent:0.0|#Level:Host,device_id:3|#hostname:ip-172-31-37-85,timestamp:1647366550
2022-03-15T17:49:10,128 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUsed.Megabytes:0|#Level:Host,device_id:3|#hostname:ip-172-31-37-85,timestamp:1647366550
2022-03-15T17:49:10,128 [INFO ] pool-3-thread-1 TS_METRICS - GPUUtilization.Percent:0|#Level:Host,device_id:0|#hostname:ip-172-31-37-85,timestamp:1647366550
2022-03-15T17:49:10,128 [INFO ] pool-3-thread-1 TS_METRICS - GPUUtilization.Percent:0|#Level:Host,device_id:1|#hostname:ip-172-31-37-85,timestamp:1647366550
2022-03-15T17:49:10,128 [INFO ] pool-3-thread-1 TS_METRICS - GPUUtilization.Percent:0|#Level:Host,device_id:2|#hostname:ip-172-31-37-85,timestamp:1647366550
2022-03-15T17:49:10,128 [INFO ] pool-3-thread-1 TS_METRICS - GPUUtilization.Percent:0|#Level:Host,device_id:3|#hostname:ip-172-31-37-85,timestamp:1647366550
2022-03-15T17:49:10,129 [INFO ] pool-3-thread-1 TS_METRICS - MemoryAvailable.Megabytes:187889.33984375|#Level:Host|#hostname:ip-172-31-37-85,timestamp:1647366550
2022-03-15T17:49:10,129 [INFO ] pool-3-thread-1 TS_METRICS - MemoryUsed.Megabytes:1500.078125|#Level:Host|#hostname:ip-172-31-37-85,timestamp:1647366550
2022-03-15T17:49:10,129 [INFO ] pool-3-thread-1 TS_METRICS - MemoryUtilization.Percent:1.7|#Level:Host|#hostname:ip-172-31-37-85,timestamp:1647366550
2022-03-15T17:50:10,123 [INFO ] pool-3-thread-1 TS_METRICS - CPUUtilization.Percent:0.0|#Level:Host|#hostname:ip-172-31-37-85,timestamp:1647366610
2022-03-15T17:50:10,124 [INFO ] pool-3-thread-1 TS_METRICS - DiskAvailable.Gigabytes:247.6761131286621|#Level:Host|#hostname:ip-172-31-37-85,timestamp:1647366610
2022-03-15T17:50:10,124 [INFO ] pool-3-thread-1 TS_METRICS - DiskUsage.Gigabytes:140.00981903076172|#Level:Host|#hostname:ip-172-31-37-85,timestamp:1647366610
2022-03-15T17:50:10,124 [INFO ] pool-3-thread-1 TS_METRICS - DiskUtilization.Percent:36.1|#Level:Host|#hostname:ip-172-31-37-85,timestamp:1647366610
2022-03-15T17:50:10,124 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUtilization.Percent:0.0|#Level:Host,device_id:0|#hostname:ip-172-31-37-85,timestamp:1647366610
2022-03-15T17:50:10,124 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUsed.Megabytes:0|#Level:Host,device_id:0|#hostname:ip-172-31-37-85,timestamp:1647366610
2022-03-15T17:50:10,124 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUtilization.Percent:0.0|#Level:Host,device_id:1|#hostname:ip-172-31-37-85,timestamp:1647366610
2022-03-15T17:50:10,124 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUsed.Megabytes:0|#Level:Host,device_id:1|#hostname:ip-172-31-37-85,timestamp:1647366610
2022-03-15T17:50:10,125 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUtilization.Percent:0.0|#Level:Host,device_id:2|#hostname:ip-172-31-37-85,timestamp:1647366610
2022-03-15T17:50:10,125 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUsed.Megabytes:0|#Level:Host,device_id:2|#hostname:ip-172-31-37-85,timestamp:1647366610
2022-03-15T17:50:10,125 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUtilization.Percent:0.0|#Level:Host,device_id:3|#hostname:ip-172-31-37-85,timestamp:1647366610
2022-03-15T17:50:10,125 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUsed.Megabytes:0|#Level:Host,device_id:3|#hostname:ip-172-31-37-85,timestamp:1647366610
2022-03-15T17:50:10,125 [INFO ] pool-3-thread-1 TS_METRICS - GPUUtilization.Percent:0|#Level:Host,device_id:0|#hostname:ip-172-31-37-85,timestamp:1647366610
2022-03-15T17:50:10,125 [INFO ] pool-3-thread-1 TS_METRICS - GPUUtilization.Percent:0|#Level:Host,device_id:1|#hostname:ip-172-31-37-85,timestamp:1647366610
2022-03-15T17:50:10,125 [INFO ] pool-3-thread-1 TS_METRICS - GPUUtilization.Percent:0|#Level:Host,device_id:2|#hostname:ip-172-31-37-85,timestamp:1647366610
2022-03-15T17:50:10,125 [INFO ] pool-3-thread-1 TS_METRICS - GPUUtilization.Percent:0|#Level:Host,device_id:3|#hostname:ip-172-31-37-85,timestamp:1647366610
2022-03-15T17:50:10,125 [INFO ] pool-3-thread-1 TS_METRICS - MemoryAvailable.Megabytes:187893.00390625|#Level:Host|#hostname:ip-172-31-37-85,timestamp:1647366610
2022-03-15T17:50:10,126 [INFO ] pool-3-thread-1 TS_METRICS - MemoryUsed.Megabytes:1496.40625|#Level:Host|#hostname:ip-172-31-37-85,timestamp:1647366610
2022-03-15T17:50:10,126 [INFO ] pool-3-thread-1 TS_METRICS - MemoryUtilization.Percent:1.7|#Level:Host|#hostname:ip-172-31-37-85,timestamp:1647366610
2022-03-15T17:51:10,118 [INFO ] pool-3-thread-1 TS_METRICS - CPUUtilization.Percent:0.0|#Level:Host|#hostname:ip-172-31-37-85,timestamp:1647366670
2022-03-15T17:51:10,118 [INFO ] pool-3-thread-1 TS_METRICS - DiskAvailable.Gigabytes:247.67610549926758|#Level:Host|#hostname:ip-172-31-37-85,timestamp:1647366670
2022-03-15T17:51:10,118 [INFO ] pool-3-thread-1 TS_METRICS - DiskUsage.Gigabytes:140.00982666015625|#Level:Host|#hostname:ip-172-31-37-85,timestamp:1647366670
2022-03-15T17:51:10,118 [INFO ] pool-3-thread-1 TS_METRICS - DiskUtilization.Percent:36.1|#Level:Host|#hostname:ip-172-31-37-85,timestamp:1647366670
2022-03-15T17:51:10,119 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUtilization.Percent:0.0|#Level:Host,device_id:0|#hostname:ip-172-31-37-85,timestamp:1647366670
2022-03-15T17:51:10,119 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUsed.Megabytes:0|#Level:Host,device_id:0|#hostname:ip-172-31-37-85,timestamp:1647366670
2022-03-15T17:51:10,119 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUtilization.Percent:0.0|#Level:Host,device_id:1|#hostname:ip-172-31-37-85,timestamp:1647366670
2022-03-15T17:51:10,119 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUsed.Megabytes:0|#Level:Host,device_id:1|#hostname:ip-172-31-37-85,timestamp:1647366670
2022-03-15T17:51:10,119 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUtilization.Percent:0.0|#Level:Host,device_id:2|#hostname:ip-172-31-37-85,timestamp:1647366670
2022-03-15T17:51:10,119 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUsed.Megabytes:0|#Level:Host,device_id:2|#hostname:ip-172-31-37-85,timestamp:1647366670
2022-03-15T17:51:10,119 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUtilization.Percent:0.0|#Level:Host,device_id:3|#hostname:ip-172-31-37-85,timestamp:1647366670
2022-03-15T17:51:10,119 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUsed.Megabytes:0|#Level:Host,device_id:3|#hostname:ip-172-31-37-85,timestamp:1647366670
2022-03-15T17:51:10,120 [INFO ] pool-3-thread-1 TS_METRICS - GPUUtilization.Percent:0|#Level:Host,device_id:0|#hostname:ip-172-31-37-85,timestamp:1647366670
2022-03-15T17:51:10,120 [INFO ] pool-3-thread-1 TS_METRICS - GPUUtilization.Percent:0|#Level:Host,device_id:1|#hostname:ip-172-31-37-85,timestamp:1647366670
2022-03-15T17:51:10,120 [INFO ] pool-3-thread-1 TS_METRICS - GPUUtilization.Percent:0|#Level:Host,device_id:2|#hostname:ip-172-31-37-85,timestamp:1647366670
2022-03-15T17:51:10,120 [INFO ] pool-3-thread-1 TS_METRICS - GPUUtilization.Percent:0|#Level:Host,device_id:3|#hostname:ip-172-31-37-85,timestamp:1647366670
2022-03-15T17:51:10,120 [INFO ] pool-3-thread-1 TS_METRICS - MemoryAvailable.Megabytes:187896.70703125|#Level:Host|#hostname:ip-172-31-37-85,timestamp:1647366670
2022-03-15T17:51:10,120 [INFO ] pool-3-thread-1 TS_METRICS - MemoryUsed.Megabytes:1492.7109375|#Level:Host|#hostname:ip-172-31-37-85,timestamp:1647366670
2022-03-15T17:51:10,120 [INFO ] pool-3-thread-1 TS_METRICS - MemoryUtilization.Percent:1.7|#Level:Host|#hostname:ip-172-31-37-85,timestamp:1647366670
2022-03-15T17:51:27,951 [DEBUG] epollEventLoopGroup-3-1 org.pytorch.serve.wlm.ModelVersionedRefs - Adding new version 1.0 for model Textgeneration
2022-03-15T17:51:27,952 [DEBUG] epollEventLoopGroup-3-1 org.pytorch.serve.wlm.ModelVersionedRefs - Setting default version to 1.0 for model Textgeneration
2022-03-15T17:51:27,952 [INFO ] epollEventLoopGroup-3-1 org.pytorch.serve.wlm.ModelManager - Model Textgeneration loaded.
2022-03-15T17:51:27,953 [DEBUG] epollEventLoopGroup-3-1 org.pytorch.serve.wlm.ModelManager - updateModel: Textgeneration, count: 1
2022-03-15T17:51:27,958 [DEBUG] W-9000-Textgeneration_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [/home/ubuntu/anaconda3/envs/PT-nightly-FSDP/bin/python3.8, /home/ubuntu/anaconda3/envs/PT-nightly-FSDP/lib/python3.8/site-packages/ts/model_service_worker.py, --sock-type, unix, --sock-name, /tmp/.ts.sock.9000]
2022-03-15T17:51:28,628 [INFO ] W-9000-Textgeneration_1.0-stdout MODEL_LOG - Listening on port: /tmp/.ts.sock.9000
2022-03-15T17:51:28,628 [INFO ] W-9000-Textgeneration_1.0-stdout MODEL_LOG - [PID]48672
2022-03-15T17:51:28,629 [INFO ] W-9000-Textgeneration_1.0-stdout MODEL_LOG - Torch worker started.
2022-03-15T17:51:28,629 [INFO ] W-9000-Textgeneration_1.0-stdout MODEL_LOG - Python runtime: 3.8.12
2022-03-15T17:51:28,629 [DEBUG] W-9000-Textgeneration_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-Textgeneration_1.0 State change null -> WORKER_STARTED
2022-03-15T17:51:28,632 [INFO ] W-9000-Textgeneration_1.0 org.pytorch.serve.wlm.WorkerThread - Connecting to: /tmp/.ts.sock.9000
2022-03-15T17:51:28,637 [INFO ] W-9000-Textgeneration_1.0-stdout MODEL_LOG - Connection accepted: /tmp/.ts.sock.9000.
2022-03-15T17:51:28,639 [INFO ] W-9000-Textgeneration_1.0 org.pytorch.serve.wlm.WorkerThread - Flushing req. to backend at: 1647366688639
2022-03-15T17:51:28,659 [INFO ] W-9000-Textgeneration_1.0-stdout MODEL_LOG - model_name: Textgeneration, batchSize: 1
2022-03-15T17:51:29,363 [INFO ] W-9000-Textgeneration_1.0-stdout MODEL_LOG - Transformers version 4.11.0
2022-03-15T17:51:46,799 [INFO ] W-9000-Textgeneration_1.0-stdout MODEL_LOG - Transformer model from path /tmp/models/950aedf0a0bd4e5293c9bee6640e5199 loaded successfully
2022-03-15T17:51:46,800 [INFO ] W-9000-Textgeneration_1.0 org.pytorch.serve.wlm.WorkerThread - Backend response time: 18141
2022-03-15T17:51:46,801 [DEBUG] W-9000-Textgeneration_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-Textgeneration_1.0 State change WORKER_STARTED -> WORKER_MODEL_LOADED
2022-03-15T17:51:46,801 [INFO ] W-9000-Textgeneration_1.0 TS_METRICS - W-9000-Textgeneration_1.0.ms:18845|#Level:Host|#hostname:ip-172-31-37-85,timestamp:1647366706
2022-03-15T17:51:46,801 [INFO ] W-9000-Textgeneration_1.0 TS_METRICS - WorkerThreadTime.ms:21|#Level:Host|#hostname:ip-172-31-37-85,timestamp:null
2022-03-15T17:51:46,804 [INFO ] epollEventLoopGroup-3-1 ACCESS_LOG - /127.0.0.1:56052 "POST /models?model_name=Textgeneration&url=Textgeneration.mar&batch_size=1&max_batch_delay=5000&initial_workers=1&synchronous=true HTTP/1.1" 200 26065
2022-03-15T17:51:46,805 [INFO ] epollEventLoopGroup-3-1 TS_METRICS - Requests2XX.Count:1|#Level:Host|#hostname:ip-172-31-37-85,timestamp:null
2022-03-15T17:52:10,111 [INFO ] pool-3-thread-1 TS_METRICS - CPUUtilization.Percent:0.0|#Level:Host|#hostname:ip-172-31-37-85,timestamp:1647366730
2022-03-15T17:52:10,111 [INFO ] pool-3-thread-1 TS_METRICS - DiskAvailable.Gigabytes:247.20067977905273|#Level:Host|#hostname:ip-172-31-37-85,timestamp:1647366730
2022-03-15T17:52:10,111 [INFO ] pool-3-thread-1 TS_METRICS - DiskUsage.Gigabytes:140.4852523803711|#Level:Host|#hostname:ip-172-31-37-85,timestamp:1647366730
2022-03-15T17:52:10,111 [INFO ] pool-3-thread-1 TS_METRICS - DiskUtilization.Percent:36.2|#Level:Host|#hostname:ip-172-31-37-85,timestamp:1647366730
2022-03-15T17:52:10,111 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUtilization.Percent:9.927857568336753|#Level:Host,device_id:0|#hostname:ip-172-31-37-85,timestamp:1647366730
2022-03-15T17:52:10,111 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUsed.Megabytes:1500|#Level:Host,device_id:0|#hostname:ip-172-31-37-85,timestamp:1647366730
2022-03-15T17:52:10,112 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUtilization.Percent:8.948308954927526|#Level:Host,device_id:1|#hostname:ip-172-31-37-85,timestamp:1647366730
2022-03-15T17:52:10,112 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUsed.Megabytes:1352|#Level:Host,device_id:1|#hostname:ip-172-31-37-85,timestamp:1647366730
2022-03-15T17:52:10,112 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUtilization.Percent:8.948308954927526|#Level:Host,device_id:2|#hostname:ip-172-31-37-85,timestamp:1647366730
2022-03-15T17:52:10,112 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUsed.Megabytes:1352|#Level:Host,device_id:2|#hostname:ip-172-31-37-85,timestamp:1647366730
2022-03-15T17:52:10,112 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUtilization.Percent:8.948308954927526|#Level:Host,device_id:3|#hostname:ip-172-31-37-85,timestamp:1647366730
2022-03-15T17:52:10,112 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUsed.Megabytes:1352|#Level:Host,device_id:3|#hostname:ip-172-31-37-85,timestamp:1647366730
2022-03-15T17:52:10,112 [INFO ] pool-3-thread-1 TS_METRICS - GPUUtilization.Percent:0|#Level:Host,device_id:0|#hostname:ip-172-31-37-85,timestamp:1647366730
2022-03-15T17:52:10,112 [INFO ] pool-3-thread-1 TS_METRICS - GPUUtilization.Percent:0|#Level:Host,device_id:1|#hostname:ip-172-31-37-85,timestamp:1647366730
2022-03-15T17:52:10,112 [INFO ] pool-3-thread-1 TS_METRICS - GPUUtilization.Percent:0|#Level:Host,device_id:2|#hostname:ip-172-31-37-85,timestamp:1647366730
2022-03-15T17:52:10,112 [INFO ] pool-3-thread-1 TS_METRICS - GPUUtilization.Percent:0|#Level:Host,device_id:3|#hostname:ip-172-31-37-85,timestamp:1647366730
2022-03-15T17:52:10,112 [INFO ] pool-3-thread-1 TS_METRICS - MemoryAvailable.Megabytes:181765.68359375|#Level:Host|#hostname:ip-172-31-37-85,timestamp:1647366730
2022-03-15T17:52:10,113 [INFO ] pool-3-thread-1 TS_METRICS - MemoryUsed.Megabytes:7575.734375|#Level:Host|#hostname:ip-172-31-37-85,timestamp:1647366730
2022-03-15T17:52:10,113 [INFO ] pool-3-thread-1 TS_METRICS - MemoryUtilization.Percent:4.9|#Level:Host|#hostname:ip-172-31-37-85,timestamp:1647366730
2022-03-15T17:52:25,234 [INFO ] epollEventLoopGroup-3-2 ACCESS_LOG - /127.0.0.1:43170 "POST /predictions/my_tc HTTP/1.1" 404 4
2022-03-15T17:52:25,234 [INFO ] epollEventLoopGroup-3-2 TS_METRICS - Requests4XX.Count:1|#Level:Host|#hostname:ip-172-31-37-85,timestamp:null
2022-03-15T17:52:57,290 [INFO ] W-9000-Textgeneration_1.0 org.pytorch.serve.wlm.WorkerThread - Flushing req. to backend at: 1647366777290
2022-03-15T17:52:57,291 [INFO ] W-9000-Textgeneration_1.0-stdout MODEL_LOG - Backend received inference at: 1647366777
2022-03-15T17:52:57,291 [INFO ] W-9000-Textgeneration_1.0-stdout MODEL_LOG - Received text: 'Bloomberg has decided to publish a new report on the global economy.'
2022-03-15T17:52:57,291 [WARN ] W-9000-Textgeneration_1.0-stderr MODEL_LOG - /home/ubuntu/anaconda3/envs/PT-nightly-FSDP/lib/python3.8/site-packages/transformers/configuration_utils.py:336: UserWarning: Passing `gradient_checkpointing` to a config initialization is deprecated and will be removed in v5 Transformers. Using `model.gradient_checkpointing_enable()` instead, or if you are using the `Trainer` API, pass `gradient_checkpointing=True` in your `TrainingArguments`.
2022-03-15T17:52:57,292 [WARN ] W-9000-Textgeneration_1.0-stderr MODEL_LOG - warnings.warn(
2022-03-15T17:52:57,292 [WARN ] W-9000-Textgeneration_1.0-stderr MODEL_LOG - Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
2022-03-15T17:52:57,295 [WARN ] W-9000-Textgeneration_1.0-stderr MODEL_LOG - /home/ubuntu/anaconda3/envs/PT-nightly-FSDP/lib/python3.8/site-packages/transformers/tokenization_utils_base.py:2211: FutureWarning: The `pad_to_max_length` argument is deprecated and will be removed in a future version, use `padding=True` or `padding='longest'` to pad to the longest sequence in the batch, or use `padding='max_length'` to pad to a max length. In this case, you can give a specific length with `max_length` (e.g. `max_length=45`) or leave max_length to None to pad to the maximal input size of the model (e.g. 512 for Bert).
2022-03-15T17:52:57,295 [WARN ] W-9000-Textgeneration_1.0-stderr MODEL_LOG - warnings.warn(
2022-03-15T17:52:57,296 [WARN ] W-9000-Textgeneration_1.0-stderr MODEL_LOG - Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
2022-03-15T17:52:57,296 [WARN ] W-9000-Textgeneration_1.0-stderr MODEL_LOG - Input length of input_ids is 150, but ``max_length`` is set to 50.This can lead to unexpected behavior. You should consider increasing ``config.max_length`` or ``max_length``.
2022-03-15T17:52:57,325 [INFO ] W-9000-Textgeneration_1.0-stdout MODEL_LOG - Generated text: '['Bloomberg has decided to publish a new report on the global economy.From']'
2022-03-15T17:52:57,325 [INFO ] W-9000-Textgeneration_1.0-stdout MODEL_LOG - Generated text ['Bloomberg has decided to publish a new report on the global economy.From']
2022-03-15T17:52:57,325 [INFO ] W-9000-Textgeneration_1.0-stdout MODEL_METRICS - HandlerTime.Milliseconds:33.68|#ModelName:Textgeneration,Level:Model|#hostname:ip-172-31-37-85,requestID:3fbd2201-ce14-4063-a12d-5aa076394135,timestamp:1647366777
2022-03-15T17:52:57,325 [INFO ] W-9000-Textgeneration_1.0-stdout MODEL_METRICS - PredictionTime.Milliseconds:33.74|#ModelName:Textgeneration,Level:Model|#hostname:ip-172-31-37-85,requestID:3fbd2201-ce14-4063-a12d-5aa076394135,timestamp:1647366777
2022-03-15T17:52:57,325 [INFO ] W-9000-Textgeneration_1.0 org.pytorch.serve.wlm.WorkerThread - Backend response time: 34
2022-03-15T17:52:57,326 [INFO ] W-9000-Textgeneration_1.0 ACCESS_LOG - /127.0.0.1:43172 "POST /predictions/Textgeneration HTTP/1.1" 200 37
2022-03-15T17:52:57,326 [INFO ] W-9000-Textgeneration_1.0 TS_METRICS - Requests2XX.Count:1|#Level:Host|#hostname:ip-172-31-37-85,timestamp:null
2022-03-15T17:52:57,326 [DEBUG] W-9000-Textgeneration_1.0 org.pytorch.serve.job.Job - Waiting time ns: 147579, Backend time ns: 36311344
2022-03-15T17:52:57,326 [INFO ] W-9000-Textgeneration_1.0 TS_METRICS - QueueTime.ms:0|#Level:Host|#hostname:ip-172-31-37-85,timestamp:null
2022-03-15T17:52:57,326 [INFO ] W-9000-Textgeneration_1.0 TS_METRICS - WorkerThreadTime.ms:2|#Level:Host|#hostname:ip-172-31-37-85,timestamp:null
2022-03-15T17:53:10,113 [INFO ] pool-3-thread-1 TS_METRICS - CPUUtilization.Percent:0.0|#Level:Host|#hostname:ip-172-31-37-85,timestamp:1647366790
2022-03-15T17:53:10,113 [INFO ] pool-3-thread-1 TS_METRICS - DiskAvailable.Gigabytes:247.20066833496094|#Level:Host|#hostname:ip-172-31-37-85,timestamp:1647366790
2022-03-15T17:53:10,113 [INFO ] pool-3-thread-1 TS_METRICS - DiskUsage.Gigabytes:140.4852638244629|#Level:Host|#hostname:ip-172-31-37-85,timestamp:1647366790
2022-03-15T17:53:10,113 [INFO ] pool-3-thread-1 TS_METRICS - DiskUtilization.Percent:36.2|#Level:Host|#hostname:ip-172-31-37-85,timestamp:1647366790
2022-03-15T17:53:10,113 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUtilization.Percent:10.21907472367463|#Level:Host,device_id:0|#hostname:ip-172-31-37-85,timestamp:1647366790
2022-03-15T17:53:10,113 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUsed.Megabytes:1544|#Level:Host,device_id:0|#hostname:ip-172-31-37-85,timestamp:1647366790
2022-03-15T17:53:10,113 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUtilization.Percent:9.02773181547422|#Level:Host,device_id:1|#hostname:ip-172-31-37-85,timestamp:1647366790
2022-03-15T17:53:10,114 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUsed.Megabytes:1364|#Level:Host,device_id:1|#hostname:ip-172-31-37-85,timestamp:1647366790
2022-03-15T17:53:10,114 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUtilization.Percent:9.02773181547422|#Level:Host,device_id:2|#hostname:ip-172-31-37-85,timestamp:1647366790
2022-03-15T17:53:10,114 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUsed.Megabytes:1364|#Level:Host,device_id:2|#hostname:ip-172-31-37-85,timestamp:1647366790
2022-03-15T17:53:10,114 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUtilization.Percent:9.04096895889867|#Level:Host,device_id:3|#hostname:ip-172-31-37-85,timestamp:1647366790
2022-03-15T17:53:10,114 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUsed.Megabytes:1366|#Level:Host,device_id:3|#hostname:ip-172-31-37-85,timestamp:1647366790
2022-03-15T17:53:10,114 [INFO ] pool-3-thread-1 TS_METRICS - GPUUtilization.Percent:0|#Level:Host,device_id:0|#hostname:ip-172-31-37-85,timestamp:1647366790
2022-03-15T17:53:10,114 [INFO ] pool-3-thread-1 TS_METRICS - GPUUtilization.Percent:0|#Level:Host,device_id:1|#hostname:ip-172-31-37-85,timestamp:1647366790
2022-03-15T17:53:10,114 [INFO ] pool-3-thread-1 TS_METRICS - GPUUtilization.Percent:0|#Level:Host,device_id:2|#hostname:ip-172-31-37-85,timestamp:1647366790
2022-03-15T17:53:10,114 [INFO ] pool-3-thread-1 TS_METRICS - GPUUtilization.Percent:0|#Level:Host,device_id:3|#hostname:ip-172-31-37-85,timestamp:1647366790
2022-03-15T17:53:10,114 [INFO ] pool-3-thread-1 TS_METRICS - MemoryAvailable.Megabytes:181728.546875|#Level:Host|#hostname:ip-172-31-37-85,timestamp:1647366790
2022-03-15T17:53:10,114 [INFO ] pool-3-thread-1 TS_METRICS - MemoryUsed.Megabytes:7612.8671875|#Level:Host|#hostname:ip-172-31-37-85,timestamp:1647366790
2022-03-15T17:53:10,114 [INFO ] pool-3-thread-1 TS_METRICS - MemoryUtilization.Percent:4.9|#Level:Host|#hostname:ip-172-31-37-85,timestamp:1647366790
Every 0.1s: nvidia-smi ip-172-31-37-85: Tue Mar 15 17:53:32 2022
Tue Mar 15 17:53:32 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.142.00 Driver Version: 450.142.00 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 On | 00000000:00:1B.0 Off | 0 |
| N/A 38C P0 26W / 70W | 1544MiB / 15109MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 Tesla T4 On | 00000000:00:1C.0 Off | 0 |
| N/A 37C P0 27W / 70W | 1364MiB / 15109MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 2 Tesla T4 On | 00000000:00:1D.0 Off | 0 |
| N/A 39C P0 27W / 70W | 1364MiB / 15109MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 3 Tesla T4 On | 00000000:00:1E.0 Off | 0 |
| N/A 37C P0 27W / 70W | 1366MiB / 15109MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment