Skip to content

Instantly share code, notes, and snippets.

@llan-ml
Created December 12, 2018 06:43
Show Gist options
  • Save llan-ml/e40e57fe4e78a50448bb00a54cf11ec0 to your computer and use it in GitHub Desktop.
Save llan-ml/e40e57fe4e78a50448bb00a54cf11ec0 to your computer and use it in GitHub Desktop.
ray log files
NodeManager:
LocalResources:
- total: {GPU,2.000000},{CPU,56.000000}
- avail: {GPU,2.000000},{CPU,56.000000}
ClusterResources:
428585e810020aa193d7a54dcaba8c38fb312d97:
- total: {CPU,64.000000},{GPU,0.000000}
- avail: {CPU,22.000000},{GPU,0.000000}
ad0ba17256effc2ae0c91240ea6f2e0952812540:
- total: {CPU,88.000000},{GPU,0.000000}
- avail: {CPU,31.000000},{GPU,0.000000}
a66a585cd3e9bacd7a7ee984f8570f9369e74f06:
- total: {CPU,64.000000},{GPU,0.000000}
- avail: {CPU,19.000000},{GPU,0.000000}
c674842b09b9c470d323ea11448e77e796b28457:
- total: {CPU,56.000000},{GPU,2.000000}
- avail: {CPU,18.000000},{GPU,2.000000}
c76b75da4d3d87d4f336062a1a4d2ceacf95abe0:
- total: {CPU,72.000000},{GPU,0.000000}
- avail: {CPU,25.000000},{GPU,0.000000}
326f13c2ae08ca1d4e9cef74c8bb736247085697:
- total: {CPU,64.000000},{GPU,0.000000}
- avail: {CPU,27.000000},{GPU,0.000000}
e2065e4481336004572b9243a7472ccf342e1643:
- total: {CPU,72.000000},{GPU,0.000000}
- avail: {CPU,21.000000},{GPU,0.000000}
cb0891db23d913a5075309754cc290f677f19f48:
- total: {CPU,72.000000},{GPU,0.000000}
- avail: {CPU,27.000000},{GPU,0.000000}
bd366137c01db78ae1ead82d21bbef14031f1e01:
- total: {CPU,72.000000},{GPU,0.000000}
- avail: {CPU,29.000000},{GPU,0.000000}
b48c30a16eeeca7e711d5d49456a6fd146075ed2:
- total: {GPU,2.000000},{CPU,56.000000}
- avail: {GPU,2.000000},{CPU,21.000000}
ObjectManager:
- num local objects: 314345
- num active wait requests: 1
- num unfulfilled push requests: 0
- num pull requests: 24
- num buffered profile events: 0
ObjectDirectory:
- num listeners: 24
- num eviction entries: 8935876
ObjectStoreNotificationManager:
- num adds processed: 8935876
- num removes processed: 8621531
BufferPool:
- get buffer state map size: 0
- create buffer state map size: 0
ConnectionPool:
- num message send connections: 9
- num transfer send connections: 9
- num avail message send connections: 9
- num avail transfer send connections: 9
- num message receive connections: 9
- num transfer receive connections: 9
AsyncGcsClient:
- TaskTable: num lookups: 0, num adds: 5472120
- ActorTable: num lookups: 96, num appends: 77
- TaskReconstructionLog: num lookups: 0, num appends: 0
- TaskLeaseTable: num lookups: 0, num adds: 5521964
- HeartbeatTable: num lookups: 0, num adds: 199517
- ErrorTable: num lookups: 0, num appends: 0
- ProfileTable: num lookups: 0, num appends: 697998
- ClientTable: num lookups: 0, num appends: 1, cache size: 10, num removed: 0
- DriverTable: num lookups: 0, num appends: 0
WorkerPool:
- num workers: 35
- num drivers: 1
SchedulingQueue:
- num placeable tasks: 0
- num waiting tasks: 0
- num ready tasks: 0
- num running tasks: 29
- num infeasible tasks: 0
- num methods waiting for actor creation: 0
ReconstructionPolicy:
- num reconstructing: 24
TaskDependencyManager:
- task dep map size: 5
- task req map size: 24
- req objects map size: 24
- local objects map size: 3417571
- pending tasks map size: 29
LineageCache:
- committed tasks: 0
- child map size: 3
- num subscribed tasks: 3
- lineage size: 3
ActorRegistry:
- num live actors: 440
- num dead actors: 220
- max num handles: 22
RemoteConnections:
428585e810020aa193d7a54dcaba8c38fb312d97:
- bytes read: 0
- bytes written: 441372808
- num async writes: 704204
- num sync writes: 0
- writing: 0
- pending async bytes: 0
ad0ba17256effc2ae0c91240ea6f2e0952812540:
- bytes read: 0
- bytes written: 183763344
- num async writes: 298086
- num sync writes: 0
- writing: 0
- pending async bytes: 0
a66a585cd3e9bacd7a7ee984f8570f9369e74f06:
- bytes read: 0
- bytes written: 289897632
- num async writes: 462410
- num sync writes: 0
- writing: 0
- pending async bytes: 0
c674842b09b9c470d323ea11448e77e796b28457:
- bytes read: 0
- bytes written: 135063768
- num async writes: 216007
- num sync writes: 0
- writing: 0
- pending async bytes: 0
c76b75da4d3d87d4f336062a1a4d2ceacf95abe0:
- bytes read: 0
- bytes written: 108442056
- num async writes: 173756
- num sync writes: 0
- writing: 0
- pending async bytes: 0
326f13c2ae08ca1d4e9cef74c8bb736247085697:
- bytes read: 0
- bytes written: 124173720
- num async writes: 200190
- num sync writes: 0
- writing: 0
- pending async bytes: 0
e2065e4481336004572b9243a7472ccf342e1643:
- bytes read: 0
- bytes written: 435348528
- num async writes: 694734
- num sync writes: 0
- writing: 0
- pending async bytes: 0
cb0891db23d913a5075309754cc290f677f19f48:
- bytes read: 0
- bytes written: 173563280
- num async writes: 279059
- num sync writes: 0
- writing: 0
- pending async bytes: 0
bd366137c01db78ae1ead82d21bbef14031f1e01:
- bytes read: 0
- bytes written: 104004064
- num async writes: 169042
- num sync writes: 0
- writing: 0
- pending async bytes: 0
DebugString() time ms: 1
W1212 04:43:15.261960 24873 monitor.cc:48] Client timed out: c674842b09b9c470d323ea11448e77e796b28457
W1212 04:43:15.263878 24873 monitor.cc:48] Client timed out: e2065e4481336004572b9243a7472ccf342e1643
W1212 04:43:15.364790 24873 monitor.cc:48] Client timed out: b48c30a16eeeca7e711d5d49456a6fd146075ed2
W1212 04:43:15.465040 24873 monitor.cc:48] Client timed out: 326f13c2ae08ca1d4e9cef74c8bb736247085697
W1212 04:43:15.465195 24873 monitor.cc:48] Client timed out: cb0891db23d913a5075309754cc290f677f19f48
W1212 04:43:15.566326 24873 monitor.cc:48] Client timed out: c76b75da4d3d87d4f336062a1a4d2ceacf95abe0
W1212 04:43:15.667418 24873 monitor.cc:48] Client timed out: 428585e810020aa193d7a54dcaba8c38fb312d97
2018-12-11 23:06:39.080396: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-12-11 23:06:39.788132: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-12-11 23:06:40.308645: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
WARNING: Falling back to serializing objects of type <class 'numpy.dtype'> by using pickle. This may be inefficient.
2018-12-11 23:06:53.613640: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-12-11 23:06:54.037794: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-12-11 23:06:54.212768: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-12-11 23:06:54.318787: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-12-11 23:06:54.500251: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-12-11 23:06:54.605038: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-12-11 23:06:54.710268: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-12-11 23:06:54.813392: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
WARNING: Falling back to serializing objects of type <class 'numpy.dtype'> by using pickle. This may be inefficient.
2018-12-11 23:06:54.901880: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-12-11 23:06:54.988751: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-12-11 23:06:55.091566: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
WARNING: Falling back to serializing objects of type <class 'numpy.dtype'> by using pickle. This may be inefficient.
2018-12-11 23:06:55.181218: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-12-11 23:06:55.274470: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-12-11 23:06:55.381402: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-12-11 23:06:55.465374: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-12-11 23:06:55.591814: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-12-11 23:06:55.659073: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-12-11 23:06:55.811525: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-12-11 23:06:55.842495: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-12-11 23:06:55.930289: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-12-11 23:06:56.023187: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-12-11 23:06:56.094717: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-12-11 23:06:56.196621: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-12-11 23:06:56.281870: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-12-11 23:06:56.372416: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-12-11 23:06:56.469464: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-12-11 23:06:56.580417: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-12-11 23:06:56.688174: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-12-11 23:06:56.784770: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-12-11 23:06:56.898052: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-12-11 23:06:57.057113: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-12-11 23:06:57.153429: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-12-11 23:06:57.236487: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-12-11 23:06:57.311229: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-12-11 23:06:57.394589: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-12-11 23:06:57.500159: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-12-11 23:06:57.613973: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-12-11 23:06:57.776930: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-12-11 23:06:57.900035: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-12-11 23:06:58.045441: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-12-11 23:06:58.159181: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-12-11 23:06:58.239378: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-12-11 23:06:58.346872: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-12-11 23:06:58.436956: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-12-11 23:06:58.539050: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-12-11 23:06:58.693363: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-12-11 23:06:58.746742: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-12-11 23:06:58.843998: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-12-11 23:06:58.962430: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
W1211 23:17:44.187924 24878 client_connection.cc:247] [worker]ProcessMessage with type 8 took 111 ms.
W1211 23:35:56.098489 24878 client_connection.cc:247] [worker]ProcessMessage with type 8 took 110 ms.
W1211 23:49:21.808418 24878 client_connection.cc:247] [object manager]ProcessMessage with type 3 took 137 ms.
W1212 00:15:59.472271 24878 node_manager.cc:243] Last heartbeat was sent 598 ms ago
W1212 00:36:38.887095 24878 node_manager.cc:243] Last heartbeat was sent 611 ms ago
W1212 01:25:36.771476 24878 node_manager.cc:243] Last heartbeat was sent 1083 ms ago
W1212 02:29:29.607458 24878 client_connection.cc:247] [worker]ProcessMessage with type 19 took 1817 ms.
W1212 02:29:29.646205 24878 node_manager.cc:243] Last heartbeat was sent 1918 ms ago
W1212 02:34:58.544487 24878 client_connection.cc:247] [worker]ProcessMessage with type 7 took 150 ms.
W1212 02:43:47.758571 24878 client_connection.cc:247] [worker]ProcessMessage with type 7 took 1117 ms.
W1212 02:43:47.858768 24878 node_manager.cc:243] Last heartbeat was sent 1262 ms ago
W1212 02:54:27.759555 24878 client_connection.cc:247] [worker]ProcessMessage with type 7 took 1144 ms.
W1212 02:54:27.791155 24878 node_manager.cc:243] Last heartbeat was sent 1221 ms ago
W1212 03:28:20.264976 24878 client_connection.cc:247] [node manager]ProcessMessage with type 15 took 154 ms.
W1212 03:32:40.461467 24878 client_connection.cc:247] [worker]ProcessMessage with type 8 took 138 ms.
W1212 03:38:35.297118 24878 client_connection.cc:247] [node manager]ProcessMessage with type 15 took 123 ms.
W1212 03:45:44.311858 24878 client_connection.cc:247] [node manager]ProcessMessage with type 15 took 124 ms.
W1212 03:50:26.450544 24878 client_connection.cc:247] [worker]ProcessMessage with type 8 took 242 ms.
W1212 03:57:10.389991 24878 client_connection.cc:247] [worker]ProcessMessage with type 8 took 249 ms.
W1212 04:37:49.193456 24878 node_manager.cc:243] Last heartbeat was sent 2706 ms ago
F1212 04:42:45.231518 24878 node_manager.cc:492] Failed to update state to DEAD for actor 6681125417e8c0e2a8a51f474ed0cafb498578bf
*** Check failure stack trace: ***
@ 0x577d46 google::LogMessage::Fail()
@ 0x577c92 google::LogMessage::SendToLog()
@ 0x577616 google::LogMessage::Flush()
@ 0x577425 google::LogMessage::~LogMessage()
@ 0x4ddc18 ray::RayLog::~RayLog()
@ 0x50cfae _ZNSt17_Function_handlerIFvPN3ray3gcs14AsyncGcsClientERKNS0_8UniqueIDERK15ActorTableDataTEZNS0_6raylet11NodeManager23HandleDisconnectedActorES6_bEUlS3_S6_S9_E_E9_M_invokeERKSt9_Any_dataS3_S6_S9_
@ 0x4bbc8f _ZNSt17_Function_handlerIFbRKSsEZN3ray3gcs3LogINS3_8UniqueIDE14ActorTableDataE8AppendAtERKS6_SA_RSt10shared_ptrI15ActorTableDataTERKSt8functionIFvPNS4_14AsyncGcsClientESA_RKSC_EESN_iEUlS1_E_E9_M_invokeERKSt9_Any_dataS1_
@ 0x4d9ce9 (anonymous namespace)::ProcessCallback()
@ 0x4d9f2b ray::gcs::GlobalRedisCallback()
@ 0x5281e6 redisProcessCallbacks
@ 0x4dd6a3 RedisAsioClient::handle_read()
@ 0x4ddb31 boost::asio::detail::reactive_null_buffers_op<>::do_complete()
@ 0x4978e1 boost::asio::detail::task_io_service::run()
@ 0x48ff9e main
@ 0x7fc933031830 __libc_start_main
@ 0x4947d1 (unknown)
pure virtual method called
terminate called without an active exception
Fatal Python error: Aborted
Stack (most recent call first):
File "/home1/lanlin/.pyenv/versions/anaconda3-5.2.0/lib/python3.6/site-packages/ray/profiling.py", line 124 in flush_profile_data
File "/home1/lanlin/.pyenv/versions/anaconda3-5.2.0/lib/python3.6/site-packages/ray/profiling.py", line 98 in _periodically_flush_profile_events
File "/home1/lanlin/.pyenv/versions/anaconda3-5.2.0/lib/python3.6/threading.py", line 864 in run
File "/home1/lanlin/.pyenv/versions/anaconda3-5.2.0/lib/python3.6/threading.py", line 916 in _bootstrap_inner
File "/home1/lanlin/.pyenv/versions/anaconda3-5.2.0/lib/python3.6/threading.py", line 884 in _bootstrap
pure virtual method called
terminate called without an active exception
Fatal Python error: Aborted
Stack (most recent call first):
File "/home1/lanlin/.pyenv/versions/anaconda3-5.2.0/lib/python3.6/site-packages/ray/profiling.py", line 124 in flush_profile_data
File "/home1/lanlin/.pyenv/versions/anaconda3-5.2.0/lib/python3.6/site-packages/ray/profiling.py", line 98 in _periodically_flush_profile_events
File "/home1/lanlin/.pyenv/versions/anaconda3-5.2.0/lib/python3.6/threading.py", line 864 in run
File "/home1/lanlin/.pyenv/versions/anaconda3-5.2.0/lib/python3.6/threading.py", line 916 in _bootstrap_inner
File "/home1/lanlin/.pyenv/versions/anaconda3-5.2.0/lib/python3.6/threading.py", line 884 in _bootstrap
pure virtual method called
terminate called without an active exception
Fatal Python error: Aborted
Stack (most recent call first):
File "/home1/lanlin/.pyenv/versions/anaconda3-5.2.0/lib/python3.6/site-packages/ray/profiling.py", line 124 in flush_profile_data
File "/home1/lanlin/.pyenv/versions/anaconda3-5.2.0/lib/python3.6/site-packages/ray/profiling.py", line 98 in _periodically_flush_profile_events
File "/home1/lanlin/.pyenv/versions/anaconda3-5.2.0/lib/python3.6/threading.py", line 864 in run
File "/home1/lanlin/.pyenv/versions/anaconda3-5.2.0/lib/python3.6/threading.py", line 916 in _bootstrap_inner
File "/home1/lanlin/.pyenv/versions/anaconda3-5.2.0/lib/python3.6/threading.py", line 884 in _bootstrap
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment