Skip to content

Instantly share code, notes, and snippets.

@n1mmy
Last active December 22, 2015 12:19
Show Gist options
  • Save n1mmy/6471542 to your computer and use it in GitHub Desktop.
Save n1mmy/6471542 to your computer and use it in GitHub Desktop.
logs for various mongo issues
Exhibit 1: two different primary handoffs. Only the '[rs*]' logs.
Thu Aug 8 14:30:32.395 [rsHealthPoll] DBClientCursor::init call() failed
Thu Aug 8 14:30:32.395 [rsHealthPoll] replSet info production-db-a3.meteor.io:27017 is down (or slow to respond): DBClientBase::findN: transport error: production-db-a3.meteor.i
o:27017 ns: local.$cmd query: { authenticate: 1, nonce: "dffead39ddc33b3d", user: "__system", key: "68664b9158f9708b8f5531498803cc1b" }
Thu Aug 8 14:30:32.395 [rsHealthPoll] replSet member production-db-a3.meteor.io:27017 is now in state DOWN
Thu Aug 8 14:30:32.397 [rsMgr] replSet info electSelf 4
Thu Aug 8 14:30:32.399 [rsBackgroundSync] Socket recv() timeout 54.235.126.126:27017
Thu Aug 8 14:30:32.399 [rsSyncNotifier] Socket recv() timeout 54.235.126.126:27017
Thu Aug 8 14:30:32.399 [rsBackgroundSync] SocketException: remote: 54.235.126.126:27017 error: 9001 socket exception [3] server [54.235.126.126:27017]
Thu Aug 8 14:30:32.399 [rsSyncNotifier] SocketException: remote: 54.235.126.126:27017 error: 9001 socket exception [3] server [54.235.126.126:27017]
Thu Aug 8 14:30:32.399 [rsSyncNotifier] replset tracking exception: exception: 10278 dbclient error communicating with server: production-db-a3.meteor.io:27017
Thu Aug 8 14:30:32.399 [rsBackgroundSync] replSet db exception in producer: 10278 dbclient error communicating with server: production-db-a3.meteor.io:27017
Thu Aug 8 14:30:37.369 [rsHealthPoll] replSet member production-db-a3.meteor.io:27017 is up
Thu Aug 8 14:30:37.369 [rsHealthPoll] replSet member production-db-a3.meteor.io:27017 is now in state PRIMARY
Thu Aug 8 14:30:42.399 [rsMgr] replSet PRIMARY
Thu Aug 8 14:30:42.399 [rsMgr] replSet relinquishing primary state
Thu Aug 8 14:30:42.399 [rsMgr] replSet SECONDARY
Thu Aug 8 14:30:42.399 [rsMgr] replSet closing client sockets after relinquishing primary
Thu Aug 8 14:30:43.374 [rsHealthPoll] replset info production-db-a3.meteor.io:27017 heartbeat failed, retrying
Thu Aug 8 14:30:43.399 [rsBackgroundSync] replSet syncing to: production-db-a3.meteor.io:27017
Thu Aug 8 14:30:53.927 [rsHealthPoll] DBClientCursor::init call() failed
Thu Aug 8 14:30:53.927 [rsHealthPoll] replSet info production-db-a2.meteor.io:27017 is down (or slow to respond): DBClientBase::findN: transport error: production-db-a2.meteor.i
o:27017 ns: local.$cmd query: { authenticate: 1, nonce: "e013a322da1ba96", user: "__system", key: "724f1ec49d0e09f6f221a5d4f1072006" }
Thu Aug 8 14:30:53.927 [rsHealthPoll] replSet member production-db-a2.meteor.io:27017 is now in state DOWN
Thu Aug 8 14:30:57.402 [rsHealthPoll] replset info production-db-a3.meteor.io:27017 thinks that we are down
Thu Aug 8 14:30:59.404 [rsHealthPoll] replset info production-db-a3.meteor.io:27017 thinks that we are down
Thu Aug 8 14:31:01.406 [rsHealthPoll] replset info production-db-a3.meteor.io:27017 thinks that we are down
Thu Aug 8 14:31:03.409 [rsHealthPoll] replset info production-db-a3.meteor.io:27017 thinks that we are down
Thu Aug 8 14:31:05.419 [rsHealthPoll] replset info production-db-a3.meteor.io:27017 thinks that we are down
Thu Aug 8 14:31:05.931 [rsHealthPoll] DBClientCursor::init call() failed
Thu Aug 8 14:31:07.421 [rsHealthPoll] replset info production-db-a3.meteor.io:27017 thinks that we are down
Thu Aug 8 14:31:07.939 [rsHealthPoll] replSet member production-db-a2.meteor.io:27017 is up
Thu Aug 8 14:31:07.939 [rsHealthPoll] replSet member production-db-a2.meteor.io:27017 is now in state SECONDARY
Thu Aug 8 14:31:09.424 [rsHealthPoll] replset info production-db-a3.meteor.io:27017 thinks that we are down
Thu Aug 8 14:31:11.426 [rsHealthPoll] replset info production-db-a3.meteor.io:27017 thinks that we are down
Thu Aug 8 14:31:13.429 [rsHealthPoll] replset info production-db-a3.meteor.io:27017 thinks that we are down
Thu Aug 8 14:31:14.578 [rsSyncNotifier] replset setting oplog notifier to production-db-a3.meteor.io:27017
Thu Aug 8 14:31:14.579 [rsSyncNotifier] Socket flush send() errno:9 Bad file descriptor 54.235.126.126:27017
Thu Aug 8 14:31:14.579 [rsSyncNotifier] caught exception (socket exception [SEND_ERROR] for 54.235.126.126:27017) in destructor (~PiggyBackData)
Thu Aug 8 21:34:23.331 [rsHealthPoll] DBClientCursor::init call() failed
Thu Aug 8 21:34:23.331 [rsHealthPoll] replSet info production-db-a3.meteor.io:27017 is down (or slow to respond): DBClientBase::findN: transport error: production-db-a3.meteor.io:27017 ns: local.$cmd query: { authenticate: 1, nonce: "2fd9edd64e1050c", user: "__system", key: "d1bb399c51b3e6b8f174a2bea5c77d17" }
Thu Aug 8 21:34:23.331 [rsHealthPoll] replSet member production-db-a3.meteor.io:27017 is now in state DOWN
Thu Aug 8 21:34:23.333 [rsMgr] not electing self, production-db-a2.meteor.io:27017 would veto with 'production-db-a1.meteor.io:27017 is trying to elect itself but production-db-a3.meteor.io:27017 is already primary and more up-to-date'
Thu Aug 8 21:34:28.389 [rsMgr] not electing self, production-db-a2.meteor.io:27017 would veto with 'production-db-a1.meteor.io:27017 is trying to elect itself but production-db-a3.meteor.io:27017 is already primary and more up-to-date'
Thu Aug 8 21:34:34.395 [rsMgr] not electing self, production-db-a2.meteor.io:27017 would veto with 'production-db-a1.meteor.io:27017 is trying to elect itself but production-db-a3.meteor.io:27017 is already primary and more up-to-date'
Thu Aug 8 21:34:35.335 [rsHealthPoll] DBClientCursor::init call() failed
Thu Aug 8 21:34:36.923 [rsBackgroundSync] Socket recv() timeout 54.235.126.126:27017
Thu Aug 8 21:34:36.923 [rsSyncNotifier] Socket recv() timeout 54.235.126.126:27017
Thu Aug 8 21:34:36.923 [rsBackgroundSync] SocketException: remote: 54.235.126.126:27017 error: 9001 socket exception [3] server [54.235.126.126:27017]
Thu Aug 8 21:34:36.923 [rsSyncNotifier] SocketException: remote: 54.235.126.126:27017 error: 9001 socket exception [3] server [54.235.126.126:27017]
Thu Aug 8 21:34:36.923 [rsSyncNotifier] replset tracking exception: exception: 10278 dbclient error communicating with server: production-db-a3.meteor.io:27017
Thu Aug 8 21:34:36.923 [rsBackgroundSync] replSet db exception in producer: 10278 dbclient error communicating with server: production-db-a3.meteor.io:27017
Thu Aug 8 21:34:39.156 [rsHealthPoll] replSet member production-db-a3.meteor.io:27017 is up
Thu Aug 8 21:34:39.156 [rsHealthPoll] replSet member production-db-a3.meteor.io:27017 is now in state PRIMARY
Thu Aug 8 21:34:46.925 [rsBackgroundSync] replSet syncing to: production-db-a3.meteor.io:27017
Thu Aug 8 21:34:59.181 [rsHealthPoll] replset info production-db-a3.meteor.io:27017 thinks that we are down
Thu Aug 8 21:35:01.183 [rsHealthPoll] replset info production-db-a3.meteor.io:27017 thinks that we are down
Thu Aug 8 21:35:03.186 [rsHealthPoll] replset info production-db-a3.meteor.io:27017 thinks that we are down
Thu Aug 8 21:35:05.196 [rsHealthPoll] replset info production-db-a3.meteor.io:27017 thinks that we are down
Thu Aug 8 21:35:07.199 [rsHealthPoll] replset info production-db-a3.meteor.io:27017 thinks that we are down
Thu Aug 8 21:35:09.201 [rsHealthPoll] replset info production-db-a3.meteor.io:27017 thinks that we are down
Thu Aug 8 21:35:11.204 [rsHealthPoll] replset info production-db-a3.meteor.io:27017 thinks that we are down
Thu Aug 8 21:35:11.677 [rsBackgroundSync] replSet db exception in producer: 10278 dbclient error communicating with server: production-db-a3.meteor.io:27017
Thu Aug 8 21:35:13.206 [rsHealthPoll] replset info production-db-a3.meteor.io:27017 thinks that we are down
Thu Aug 8 21:35:13.206 [rsHealthPoll] replSet member production-db-a3.meteor.io:27017 is now in state SECONDARY
Thu Aug 8 21:35:15.209 [rsHealthPoll] replset info production-db-a3.meteor.io:27017 thinks that we are down
Thu Aug 8 21:35:15.441 [rsSyncNotifier] replset setting oplog notifier to production-db-a3.meteor.io:27017
Thu Aug 8 21:35:19.218 [rsMgr] not electing self, production-db-a2.meteor.io:27017 would veto with 'I don't think production-db-a1.meteor.io:27017 is electable'
Thu Aug 8 21:35:21.834 [rsHealthPoll] replSet member production-db-a2.meteor.io:27017 is now in state PRIMARY
Thu Aug 8 21:35:22.677 [rsBackgroundSync] replSet syncing to: production-db-a2.meteor.io:27017
Thu Aug 8 21:35:26.328 [rsSyncNotifier] replset setting oplog notifier to production-db-a2.meteor.io:27017
Exhibit 2: node enters the FATAL state
Sat Aug 10 04:54:59.413 [rsBackgroundSync] replSet our last op time fetched: Aug 10 04:54:57:10
Sat Aug 10 04:54:59.413 [rsBackgroundSync] replset source's GTE: Aug 10 04:54:59:1
Sat Aug 10 04:54:59.413 [rsBackgroundSync] replSet rollback 0
Sat Aug 10 04:54:59.413 [rsBackgroundSync] replSet ROLLBACK
Sat Aug 10 04:54:59.413 [rsBackgroundSync] replSet rollback 1
Sat Aug 10 04:54:59.413 [rsBackgroundSync] replSet rollback 2 FindCommonPoint
Sat Aug 10 04:54:59.418 [rsBackgroundSync] replSet info rollback our last optime: Aug 10 04:54:57:10
Sat Aug 10 04:54:59.418 [rsBackgroundSync] replSet info rollback their last optime: Aug 10 04:54:59:68
Sat Aug 10 04:54:59.418 [rsBackgroundSync] replSet info rollback diff in end of log times: -2 seconds
...
Sat Aug 10 04:54:59.823 [rsBackgroundSync] replSet error rollback : can't rollback drop database full resync will be required
Sat Aug 10 04:54:59.823 [rsBackgroundSync] replSet { dropDatabase: 1 }
Sat Aug 10 04:54:59.823 [rsBackgroundSync] replSet error fatal, stopping replication
Exhibit 3: ulimits for mongod process
Limit Soft Limit Hard Limit Units
Max cpu time unlimited unlimited seconds
Max file size unlimited unlimited bytes
Max data size unlimited unlimited bytes
Max stack size 8388608 unlimited bytes
Max core file size 0 unlimited bytes
Max resident set unlimited unlimited bytes
Max processes 240154 240154 processes
Max open files 64000 64000 files
Max locked memory 65536 65536 bytes
Max address space unlimited unlimited bytes
Max file locks unlimited unlimited locks
Max pending signals 240154 240154 signals
Max msgqueue size 819200 819200 bytes
Max nice priority 0 0
Max realtime priority 0 0
Max realtime timeout unlimited unlimited us
Exhibit 4: mongod crashes
Tue Aug 13 06:40:35.160 [initandlisten] connection accepted from 54.226.6.156:57685 #21354256 (11747 connections now open)
Tue Aug 13 06:40:35.160 [initandlisten] pthread_create failed: errno:11 Resource temporarily unavailable
Tue Aug 13 06:40:35.160 [initandlisten] can't create new thread, closing connection
Tue Aug 13 06:40:35.162 [initandlisten] connection accepted from 54.226.6.156:57686 #21354257 (11747 connections now open)
Tue Aug 13 06:40:35.162 [initandlisten] pthread_create failed: errno:11 Resource temporarily unavailable
Tue Aug 13 06:40:35.162 [initandlisten] can't create new thread, closing connection
Tue Aug 13 06:40:35.278 out of memory, printing stack and exiting:
0xdcf361 0x6ce53e 0x1201a0d 0xc3e8d3 0xc3933e 0xc39486 0xd93613 0xd940c4 0xe17cb9 0x7f8cd5b1ee9a 0x7f8cd4e2eccd
Tue Aug 13 06:40:35.283 [initandlisten] connection accepted from 54.226.6.156:57688 #21354258 (11747 connections now open)
Tue Aug 13 06:40:35.283 [initandlisten] pthread_create failed: errno:11 Resource temporarily unavailable
Tue Aug 13 06:40:35.284 [initandlisten] can't create new thread, closing connection
/usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xdcf361]
/usr/bin/mongod(_ZN5mongo14my_new_handlerEv+0x3e) [0x6ce53e]
/usr/bin/mongod(_Znam+0x6d) [0x1201a0d]
/usr/bin/mongod(_ZNK5mongo3Top8cloneMapERNS_9StringMapINS0_14CollectionDataEEE+0x83) [0xc3e8d3]
/usr/bin/mongod(_ZN5mongo9Snapshots12takeSnapshotEv+0x4e) [0xc3933e]
/usr/bin/mongod(_ZN5mongo14SnapshotThread3runEv+0x66) [0xc39486]
/usr/bin/mongod(_ZN5mongo13BackgroundJob7jobBodyEN5boost10shared_ptrINS0_9JobStatusEEE+0xc3) [0xd93613]
/usr/bin/mongod(_ZN5boost6detail11thread_dataINS_3_bi6bind_tIvNS_4_mfi3mf1IvN5mongo13BackgroundJobENS_10shared_ptrINS7_9JobStatusEEEEENS2_5list2INS2_5valueIPS7_EENSD_ISA_EEEEEEE3runEv+0x74) [0xd940c4]
/usr/bin/mongod() [0xe17cb9]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a) [0x7f8cd5b1ee9a]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f8cd4e2eccd]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment