Last active
December 22, 2015 12:19
-
-
Save n1mmy/6471542 to your computer and use it in GitHub Desktop.
logs for various mongo issues
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Exhibit 1: two different primary handoffs. Only the '[rs*]' logs. | |
Thu Aug 8 14:30:32.395 [rsHealthPoll] DBClientCursor::init call() failed | |
Thu Aug 8 14:30:32.395 [rsHealthPoll] replSet info production-db-a3.meteor.io:27017 is down (or slow to respond): DBClientBase::findN: transport error: production-db-a3.meteor.i | |
o:27017 ns: local.$cmd query: { authenticate: 1, nonce: "dffead39ddc33b3d", user: "__system", key: "68664b9158f9708b8f5531498803cc1b" } | |
Thu Aug 8 14:30:32.395 [rsHealthPoll] replSet member production-db-a3.meteor.io:27017 is now in state DOWN | |
Thu Aug 8 14:30:32.397 [rsMgr] replSet info electSelf 4 | |
Thu Aug 8 14:30:32.399 [rsBackgroundSync] Socket recv() timeout 54.235.126.126:27017 | |
Thu Aug 8 14:30:32.399 [rsSyncNotifier] Socket recv() timeout 54.235.126.126:27017 | |
Thu Aug 8 14:30:32.399 [rsBackgroundSync] SocketException: remote: 54.235.126.126:27017 error: 9001 socket exception [3] server [54.235.126.126:27017] | |
Thu Aug 8 14:30:32.399 [rsSyncNotifier] SocketException: remote: 54.235.126.126:27017 error: 9001 socket exception [3] server [54.235.126.126:27017] | |
Thu Aug 8 14:30:32.399 [rsSyncNotifier] replset tracking exception: exception: 10278 dbclient error communicating with server: production-db-a3.meteor.io:27017 | |
Thu Aug 8 14:30:32.399 [rsBackgroundSync] replSet db exception in producer: 10278 dbclient error communicating with server: production-db-a3.meteor.io:27017 | |
Thu Aug 8 14:30:37.369 [rsHealthPoll] replSet member production-db-a3.meteor.io:27017 is up | |
Thu Aug 8 14:30:37.369 [rsHealthPoll] replSet member production-db-a3.meteor.io:27017 is now in state PRIMARY | |
Thu Aug 8 14:30:42.399 [rsMgr] replSet PRIMARY | |
Thu Aug 8 14:30:42.399 [rsMgr] replSet relinquishing primary state | |
Thu Aug 8 14:30:42.399 [rsMgr] replSet SECONDARY | |
Thu Aug 8 14:30:42.399 [rsMgr] replSet closing client sockets after relinquishing primary | |
Thu Aug 8 14:30:43.374 [rsHealthPoll] replset info production-db-a3.meteor.io:27017 heartbeat failed, retrying | |
Thu Aug 8 14:30:43.399 [rsBackgroundSync] replSet syncing to: production-db-a3.meteor.io:27017 | |
Thu Aug 8 14:30:53.927 [rsHealthPoll] DBClientCursor::init call() failed | |
Thu Aug 8 14:30:53.927 [rsHealthPoll] replSet info production-db-a2.meteor.io:27017 is down (or slow to respond): DBClientBase::findN: transport error: production-db-a2.meteor.i | |
o:27017 ns: local.$cmd query: { authenticate: 1, nonce: "e013a322da1ba96", user: "__system", key: "724f1ec49d0e09f6f221a5d4f1072006" } | |
Thu Aug 8 14:30:53.927 [rsHealthPoll] replSet member production-db-a2.meteor.io:27017 is now in state DOWN | |
Thu Aug 8 14:30:57.402 [rsHealthPoll] replset info production-db-a3.meteor.io:27017 thinks that we are down | |
Thu Aug 8 14:30:59.404 [rsHealthPoll] replset info production-db-a3.meteor.io:27017 thinks that we are down | |
Thu Aug 8 14:31:01.406 [rsHealthPoll] replset info production-db-a3.meteor.io:27017 thinks that we are down | |
Thu Aug 8 14:31:03.409 [rsHealthPoll] replset info production-db-a3.meteor.io:27017 thinks that we are down | |
Thu Aug 8 14:31:05.419 [rsHealthPoll] replset info production-db-a3.meteor.io:27017 thinks that we are down | |
Thu Aug 8 14:31:05.931 [rsHealthPoll] DBClientCursor::init call() failed | |
Thu Aug 8 14:31:07.421 [rsHealthPoll] replset info production-db-a3.meteor.io:27017 thinks that we are down | |
Thu Aug 8 14:31:07.939 [rsHealthPoll] replSet member production-db-a2.meteor.io:27017 is up | |
Thu Aug 8 14:31:07.939 [rsHealthPoll] replSet member production-db-a2.meteor.io:27017 is now in state SECONDARY | |
Thu Aug 8 14:31:09.424 [rsHealthPoll] replset info production-db-a3.meteor.io:27017 thinks that we are down | |
Thu Aug 8 14:31:11.426 [rsHealthPoll] replset info production-db-a3.meteor.io:27017 thinks that we are down | |
Thu Aug 8 14:31:13.429 [rsHealthPoll] replset info production-db-a3.meteor.io:27017 thinks that we are down | |
Thu Aug 8 14:31:14.578 [rsSyncNotifier] replset setting oplog notifier to production-db-a3.meteor.io:27017 | |
Thu Aug 8 14:31:14.579 [rsSyncNotifier] Socket flush send() errno:9 Bad file descriptor 54.235.126.126:27017 | |
Thu Aug 8 14:31:14.579 [rsSyncNotifier] caught exception (socket exception [SEND_ERROR] for 54.235.126.126:27017) in destructor (~PiggyBackData) | |
Thu Aug 8 21:34:23.331 [rsHealthPoll] DBClientCursor::init call() failed | |
Thu Aug 8 21:34:23.331 [rsHealthPoll] replSet info production-db-a3.meteor.io:27017 is down (or slow to respond): DBClientBase::findN: transport error: production-db-a3.meteor.io:27017 ns: local.$cmd query: { authenticate: 1, nonce: "2fd9edd64e1050c", user: "__system", key: "d1bb399c51b3e6b8f174a2bea5c77d17" } | |
Thu Aug 8 21:34:23.331 [rsHealthPoll] replSet member production-db-a3.meteor.io:27017 is now in state DOWN | |
Thu Aug 8 21:34:23.333 [rsMgr] not electing self, production-db-a2.meteor.io:27017 would veto with 'production-db-a1.meteor.io:27017 is trying to elect itself but production-db-a3.meteor.io:27017 is already primary and more up-to-date' | |
Thu Aug 8 21:34:28.389 [rsMgr] not electing self, production-db-a2.meteor.io:27017 would veto with 'production-db-a1.meteor.io:27017 is trying to elect itself but production-db-a3.meteor.io:27017 is already primary and more up-to-date' | |
Thu Aug 8 21:34:34.395 [rsMgr] not electing self, production-db-a2.meteor.io:27017 would veto with 'production-db-a1.meteor.io:27017 is trying to elect itself but production-db-a3.meteor.io:27017 is already primary and more up-to-date' | |
Thu Aug 8 21:34:35.335 [rsHealthPoll] DBClientCursor::init call() failed | |
Thu Aug 8 21:34:36.923 [rsBackgroundSync] Socket recv() timeout 54.235.126.126:27017 | |
Thu Aug 8 21:34:36.923 [rsSyncNotifier] Socket recv() timeout 54.235.126.126:27017 | |
Thu Aug 8 21:34:36.923 [rsBackgroundSync] SocketException: remote: 54.235.126.126:27017 error: 9001 socket exception [3] server [54.235.126.126:27017] | |
Thu Aug 8 21:34:36.923 [rsSyncNotifier] SocketException: remote: 54.235.126.126:27017 error: 9001 socket exception [3] server [54.235.126.126:27017] | |
Thu Aug 8 21:34:36.923 [rsSyncNotifier] replset tracking exception: exception: 10278 dbclient error communicating with server: production-db-a3.meteor.io:27017 | |
Thu Aug 8 21:34:36.923 [rsBackgroundSync] replSet db exception in producer: 10278 dbclient error communicating with server: production-db-a3.meteor.io:27017 | |
Thu Aug 8 21:34:39.156 [rsHealthPoll] replSet member production-db-a3.meteor.io:27017 is up | |
Thu Aug 8 21:34:39.156 [rsHealthPoll] replSet member production-db-a3.meteor.io:27017 is now in state PRIMARY | |
Thu Aug 8 21:34:46.925 [rsBackgroundSync] replSet syncing to: production-db-a3.meteor.io:27017 | |
Thu Aug 8 21:34:59.181 [rsHealthPoll] replset info production-db-a3.meteor.io:27017 thinks that we are down | |
Thu Aug 8 21:35:01.183 [rsHealthPoll] replset info production-db-a3.meteor.io:27017 thinks that we are down | |
Thu Aug 8 21:35:03.186 [rsHealthPoll] replset info production-db-a3.meteor.io:27017 thinks that we are down | |
Thu Aug 8 21:35:05.196 [rsHealthPoll] replset info production-db-a3.meteor.io:27017 thinks that we are down | |
Thu Aug 8 21:35:07.199 [rsHealthPoll] replset info production-db-a3.meteor.io:27017 thinks that we are down | |
Thu Aug 8 21:35:09.201 [rsHealthPoll] replset info production-db-a3.meteor.io:27017 thinks that we are down | |
Thu Aug 8 21:35:11.204 [rsHealthPoll] replset info production-db-a3.meteor.io:27017 thinks that we are down | |
Thu Aug 8 21:35:11.677 [rsBackgroundSync] replSet db exception in producer: 10278 dbclient error communicating with server: production-db-a3.meteor.io:27017 | |
Thu Aug 8 21:35:13.206 [rsHealthPoll] replset info production-db-a3.meteor.io:27017 thinks that we are down | |
Thu Aug 8 21:35:13.206 [rsHealthPoll] replSet member production-db-a3.meteor.io:27017 is now in state SECONDARY | |
Thu Aug 8 21:35:15.209 [rsHealthPoll] replset info production-db-a3.meteor.io:27017 thinks that we are down | |
Thu Aug 8 21:35:15.441 [rsSyncNotifier] replset setting oplog notifier to production-db-a3.meteor.io:27017 | |
Thu Aug 8 21:35:19.218 [rsMgr] not electing self, production-db-a2.meteor.io:27017 would veto with 'I don't think production-db-a1.meteor.io:27017 is electable' | |
Thu Aug 8 21:35:21.834 [rsHealthPoll] replSet member production-db-a2.meteor.io:27017 is now in state PRIMARY | |
Thu Aug 8 21:35:22.677 [rsBackgroundSync] replSet syncing to: production-db-a2.meteor.io:27017 | |
Thu Aug 8 21:35:26.328 [rsSyncNotifier] replset setting oplog notifier to production-db-a2.meteor.io:27017 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Exhibit 2: node enters the FATAL state | |
Sat Aug 10 04:54:59.413 [rsBackgroundSync] replSet our last op time fetched: Aug 10 04:54:57:10 | |
Sat Aug 10 04:54:59.413 [rsBackgroundSync] replset source's GTE: Aug 10 04:54:59:1 | |
Sat Aug 10 04:54:59.413 [rsBackgroundSync] replSet rollback 0 | |
Sat Aug 10 04:54:59.413 [rsBackgroundSync] replSet ROLLBACK | |
Sat Aug 10 04:54:59.413 [rsBackgroundSync] replSet rollback 1 | |
Sat Aug 10 04:54:59.413 [rsBackgroundSync] replSet rollback 2 FindCommonPoint | |
Sat Aug 10 04:54:59.418 [rsBackgroundSync] replSet info rollback our last optime: Aug 10 04:54:57:10 | |
Sat Aug 10 04:54:59.418 [rsBackgroundSync] replSet info rollback their last optime: Aug 10 04:54:59:68 | |
Sat Aug 10 04:54:59.418 [rsBackgroundSync] replSet info rollback diff in end of log times: -2 seconds | |
... | |
Sat Aug 10 04:54:59.823 [rsBackgroundSync] replSet error rollback : can't rollback drop database full resync will be required | |
Sat Aug 10 04:54:59.823 [rsBackgroundSync] replSet { dropDatabase: 1 } | |
Sat Aug 10 04:54:59.823 [rsBackgroundSync] replSet error fatal, stopping replication |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Exhibit 3: ulimits for mongod process | |
Limit Soft Limit Hard Limit Units | |
Max cpu time unlimited unlimited seconds | |
Max file size unlimited unlimited bytes | |
Max data size unlimited unlimited bytes | |
Max stack size 8388608 unlimited bytes | |
Max core file size 0 unlimited bytes | |
Max resident set unlimited unlimited bytes | |
Max processes 240154 240154 processes | |
Max open files 64000 64000 files | |
Max locked memory 65536 65536 bytes | |
Max address space unlimited unlimited bytes | |
Max file locks unlimited unlimited locks | |
Max pending signals 240154 240154 signals | |
Max msgqueue size 819200 819200 bytes | |
Max nice priority 0 0 | |
Max realtime priority 0 0 | |
Max realtime timeout unlimited unlimited us |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Exhibit 4: mongod crashes | |
Tue Aug 13 06:40:35.160 [initandlisten] connection accepted from 54.226.6.156:57685 #21354256 (11747 connections now open) | |
Tue Aug 13 06:40:35.160 [initandlisten] pthread_create failed: errno:11 Resource temporarily unavailable | |
Tue Aug 13 06:40:35.160 [initandlisten] can't create new thread, closing connection | |
Tue Aug 13 06:40:35.162 [initandlisten] connection accepted from 54.226.6.156:57686 #21354257 (11747 connections now open) | |
Tue Aug 13 06:40:35.162 [initandlisten] pthread_create failed: errno:11 Resource temporarily unavailable | |
Tue Aug 13 06:40:35.162 [initandlisten] can't create new thread, closing connection | |
Tue Aug 13 06:40:35.278 out of memory, printing stack and exiting: | |
0xdcf361 0x6ce53e 0x1201a0d 0xc3e8d3 0xc3933e 0xc39486 0xd93613 0xd940c4 0xe17cb9 0x7f8cd5b1ee9a 0x7f8cd4e2eccd | |
Tue Aug 13 06:40:35.283 [initandlisten] connection accepted from 54.226.6.156:57688 #21354258 (11747 connections now open) | |
Tue Aug 13 06:40:35.283 [initandlisten] pthread_create failed: errno:11 Resource temporarily unavailable | |
Tue Aug 13 06:40:35.284 [initandlisten] can't create new thread, closing connection | |
/usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xdcf361] | |
/usr/bin/mongod(_ZN5mongo14my_new_handlerEv+0x3e) [0x6ce53e] | |
/usr/bin/mongod(_Znam+0x6d) [0x1201a0d] | |
/usr/bin/mongod(_ZNK5mongo3Top8cloneMapERNS_9StringMapINS0_14CollectionDataEEE+0x83) [0xc3e8d3] | |
/usr/bin/mongod(_ZN5mongo9Snapshots12takeSnapshotEv+0x4e) [0xc3933e] | |
/usr/bin/mongod(_ZN5mongo14SnapshotThread3runEv+0x66) [0xc39486] | |
/usr/bin/mongod(_ZN5mongo13BackgroundJob7jobBodyEN5boost10shared_ptrINS0_9JobStatusEEE+0xc3) [0xd93613] | |
/usr/bin/mongod(_ZN5boost6detail11thread_dataINS_3_bi6bind_tIvNS_4_mfi3mf1IvN5mongo13BackgroundJobENS_10shared_ptrINS7_9JobStatusEEEEENS2_5list2INS2_5valueIPS7_EENSD_ISA_EEEEEEE3runEv+0x74) [0xd940c4] | |
/usr/bin/mongod() [0xe17cb9] | |
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a) [0x7f8cd5b1ee9a] | |
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f8cd4e2eccd] |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment