Last active
July 25, 2016 07:19
-
-
Save waddles/0e121d46c0499eaaef9685eef06120f0 to your computer and use it in GitHub Desktop.
Annoying etcd problem where sometimes fleet can submit a job, other times it fails with a 500 error from etcd
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
root@elf:~# /usr/local/sbin/etcd --version | |
etcd Version: 2.3.3 | |
Git SHA: c41345d | |
Go Version: go1.6.2 | |
Go OS/Arch: linux/amd64 | |
root@tree:~# etcdctl --version | |
etcdctl version 2.3.3 | |
root@tree:~# fleetctl --version | |
fleetctl version 0.11.5 | |
root@tree:/# etcdctl member list | |
Error: dial tcp 127.0.0.1:4001: getsockopt: connection refused | |
root@tree:/# etcdctl member list | |
b5fa9b282334905d: name=dwarf peerURLs=http://10.1.1.4:2380 clientURLs=http://10.1.1.4:2379 isLeader=false | |
bdd359f77713beb4: name=elf peerURLs=http://10.1.1.3:2380 clientURLs=http://10.1.1.3:2379 isLeader=true | |
d45ce0c430449c65: name=halfling peerURLs=http://10.1.1.5:2380 clientURLs=http://10.1.1.5:2379 isLeader=false | |
root@tree:/# etcdctl --debug --endpoints=http://10.1.1.3:2379 get /_coreos.com/fleet/job/lb_feedcache_1002@.service/object | |
start to sync cluster using endpoints(http://10.1.1.3:2379) | |
cURL Command: curl -X GET http://10.1.1.3:2379/v2/members | |
got endpoints(http://10.1.1.3:2379,http://10.1.1.5:2379,http://10.1.1.4:2379) after sync | |
Cluster-Endpoints: http://10.1.1.3:2379, http://10.1.1.5:2379, http://10.1.1.4:2379 | |
cURL Command: curl -X GET http://10.1.1.3:2379/v2/keys/_coreos.com/fleet/job/lb_feedcache_1002@.service/object?quorum=false&recursive=false&sorted=false | |
Error: 100: Key not found (/_coreos.com/fleet/job) [36] | |
root@tree:/# fleetctl --debug --endpoint=http://10.1.1.3:2379 submit lb_feedcache_1002@.service | |
2016/07/25 16:30:23 DEBUG fleetctl.go:274: Defaulting to --driver=etcd as --endpoint appears to be etcd | |
2016/07/25 16:30:23 DEBUG fleetctl.go:494: Unit(lb_feedcache_1002@.service) found in local filesystem | |
2016/07/25 16:30:24 DEBUG fleetctl.go:583: Created Unit(lb_feedcache_1002@.service) in Registry | |
Unit lb_feedcache_1002@.service inactive | |
root@tree:/# etcdctl --debug --endpoints=http://10.1.1.3:2379 get /_coreos.com/fleet/job/lb_feedcache_1002@.service/object | |
start to sync cluster using endpoints(http://10.1.1.3:2379) | |
cURL Command: curl -X GET http://10.1.1.3:2379/v2/members | |
got endpoints(http://10.1.1.4:2379,http://10.1.1.3:2379,http://10.1.1.5:2379) after sync | |
Cluster-Endpoints: http://10.1.1.4:2379, http://10.1.1.3:2379, http://10.1.1.5:2379 | |
cURL Command: curl -X GET http://10.1.1.4:2379/v2/keys/_coreos.com/fleet/job/lb_feedcache_1002@.service/object?quorum=false&recursive=false&sorted=false | |
{"Name":"lb_feedcache_1002@.service","UnitHash":[104,66,102,42,116,131,7,19,58,59,79,32,97,29,252,89,238,74,88,141]} | |
--- | |
Then on all 3 cluster peers: | |
root@elf:~# rm -fr /var/lib/etcd* | |
root@elf:~# /usr/local/sbin/etcd -initial-cluster-token token1 -initial-cluster elf=http://10.1.1.3:2380,dwarf=http://10.1.1.4:2380,halfling=http://10.1.1.5:2380 -initial-cluster-state new -initial-advertise-peer-urls http://10.1.1.3:2380 -advertise-client-urls http://10.1.1.3:2379 -listen-client-urls http://127.0.0.1:2379,http://10.1.1.3:2379 -listen-peer-urls http://127.0.0.1:2380,http://10.1.1.3:2380 -data-dir /var/lib/etcd2 -name elf -heartbeat-interval 600 -election-timeout 6000 -debug > /tmp/etcd.log 2>&1 | |
--- | |
root@tree:/# etcdctl member list | |
b5fa9b282334905d: name=dwarf peerURLs=http://10.1.1.4:2380 clientURLs=http://10.1.1.4:2379 isLeader=false | |
bdd359f77713beb4: name=elf peerURLs=http://10.1.1.3:2380 clientURLs=http://10.1.1.3:2379 isLeader=true | |
d45ce0c430449c65: name=halfling peerURLs=http://10.1.1.5:2380 clientURLs=http://10.1.1.5:2379 isLeader=false | |
root@tree:/# etcdctl --debug --endpoints=http://10.1.1.3:2379 get /_coreos.com/fleet/job/lb_feedcache_1002@.service/object | |
start to sync cluster using endpoints(http://10.1.1.3:2379) | |
cURL Command: curl -X GET http://10.1.1.3:2379/v2/members | |
got endpoints(http://10.1.1.4:2379,http://10.1.1.3:2379,http://10.1.1.5:2379) after sync | |
Cluster-Endpoints: http://10.1.1.4:2379, http://10.1.1.3:2379, http://10.1.1.5:2379 | |
cURL Command: curl -X GET http://10.1.1.4:2379/v2/keys/_coreos.com/fleet/job/lb_feedcache_1002@.service/object?quorum=false&recursive=false&sorted=false | |
Error: 100: Key not found (/_coreos.com/fleet/job) [18] | |
root@tree:/# fleetctl --debug --endpoint=http://10.1.1.3:2379 submit lb_feedcache_1002@.service | |
2016/07/25 16:33:49 DEBUG fleetctl.go:274: Defaulting to --driver=etcd as --endpoint appears to be etcd | |
2016/07/25 16:33:49 DEBUG fleetctl.go:494: Unit(lb_feedcache_1002@.service) found in local filesystem | |
2016/07/25 16:33:49 DEBUG fleetctl.go:583: Created Unit(lb_feedcache_1002@.service) in Registry | |
2016/07/25 16:33:50 WARN job.go:268: No Unit found in Registry for Job(lb_feedcache_1002@.service) | |
2016/07/25 16:33:50 WARN fleetctl.go:799: Error retrieving Unit(lb_feedcache_1002@.service) from Registry: unable to parse Unit in Registry at key /_coreos.com/fleet/job/lb_feedcache_1002@.service/object | |
2016/07/25 16:33:50 WARN job.go:268: No Unit found in Registry for Job(lb_feedcache_1002@.service) | |
2016/07/25 16:33:50 WARN fleetctl.go:799: Error retrieving Unit(lb_feedcache_1002@.service) from Registry: unable to parse Unit in Registry at key /_coreos.com/fleet/job/lb_feedcache_1002@.service/object | |
2016/07/25 16:33:51 WARN job.go:268: No Unit found in Registry for Job(lb_feedcache_1002@.service) | |
2016/07/25 16:33:51 WARN fleetctl.go:799: Error retrieving Unit(lb_feedcache_1002@.service) from Registry: unable to parse Unit in Registry at key /_coreos.com/fleet/job/lb_feedcache_1002@.service/object | |
2016/07/25 16:33:51 WARN job.go:268: No Unit found in Registry for Job(lb_feedcache_1002@.service) | |
2016/07/25 16:33:51 WARN fleetctl.go:799: Error retrieving Unit(lb_feedcache_1002@.service) from Registry: unable to parse Unit in Registry at key /_coreos.com/fleet/job/lb_feedcache_1002@.service/object | |
2016/07/25 16:33:52 WARN job.go:268: No Unit found in Registry for Job(lb_feedcache_1002@.service) | |
2016/07/25 16:33:52 WARN fleetctl.go:799: Error retrieving Unit(lb_feedcache_1002@.service) from Registry: unable to parse Unit in Registry at key /_coreos.com/fleet/job/lb_feedcache_1002@.service/object | |
2016/07/25 16:33:52 WARN job.go:268: No Unit found in Registry for Job(lb_feedcache_1002@.service) | |
2016/07/25 16:33:52 WARN fleetctl.go:799: Error retrieving Unit(lb_feedcache_1002@.service) from Registry: unable to parse Unit in Registry at key /_coreos.com/fleet/job/lb_feedcache_1002@.service/object | |
2016/07/25 16:33:53 WARN job.go:268: No Unit found in Registry for Job(lb_feedcache_1002@.service) | |
2016/07/25 16:33:53 WARN fleetctl.go:799: Error retrieving Unit(lb_feedcache_1002@.service) from Registry: unable to parse Unit in Registry at key /_coreos.com/fleet/job/lb_feedcache_1002@.service/object | |
2016/07/25 16:33:53 WARN job.go:268: No Unit found in Registry for Job(lb_feedcache_1002@.service) | |
2016/07/25 16:33:53 WARN fleetctl.go:799: Error retrieving Unit(lb_feedcache_1002@.service) from Registry: unable to parse Unit in Registry at key /_coreos.com/fleet/job/lb_feedcache_1002@.service/object | |
2016/07/25 16:33:54 WARN job.go:268: No Unit found in Registry for Job(lb_feedcache_1002@.service) | |
2016/07/25 16:33:54 WARN fleetctl.go:799: Error retrieving Unit(lb_feedcache_1002@.service) from Registry: unable to parse Unit in Registry at key /_coreos.com/fleet/job/lb_feedcache_1002@.service/object | |
2016/07/25 16:33:54 WARN job.go:268: No Unit found in Registry for Job(lb_feedcache_1002@.service) | |
2016/07/25 16:33:54 WARN fleetctl.go:799: Error retrieving Unit(lb_feedcache_1002@.service) from Registry: unable to parse Unit in Registry at key /_coreos.com/fleet/job/lb_feedcache_1002@.service/object | |
2016/07/25 16:33:55 WARN job.go:268: No Unit found in Registry for Job(lb_feedcache_1002@.service) | |
2016/07/25 16:33:55 WARN fleetctl.go:799: Error retrieving Unit(lb_feedcache_1002@.service) from Registry: unable to parse Unit in Registry at key /_coreos.com/fleet/job/lb_feedcache_1002@.service/object | |
2016/07/25 16:33:55 WARN job.go:268: No Unit found in Registry for Job(lb_feedcache_1002@.service) | |
2016/07/25 16:33:55 WARN fleetctl.go:799: Error retrieving Unit(lb_feedcache_1002@.service) from Registry: unable to parse Unit in Registry at key /_coreos.com/fleet/job/lb_feedcache_1002@.service/object | |
2016/07/25 16:33:56 WARN job.go:268: No Unit found in Registry for Job(lb_feedcache_1002@.service) | |
2016/07/25 16:33:56 WARN fleetctl.go:799: Error retrieving Unit(lb_feedcache_1002@.service) from Registry: unable to parse Unit in Registry at key /_coreos.com/fleet/job/lb_feedcache_1002@.service/object | |
2016/07/25 16:33:56 WARN job.go:268: No Unit found in Registry for Job(lb_feedcache_1002@.service) | |
2016/07/25 16:33:56 WARN fleetctl.go:799: Error retrieving Unit(lb_feedcache_1002@.service) from Registry: unable to parse Unit in Registry at key /_coreos.com/fleet/job/lb_feedcache_1002@.service/object | |
2016/07/25 16:33:57 WARN job.go:268: No Unit found in Registry for Job(lb_feedcache_1002@.service) | |
2016/07/25 16:33:57 WARN fleetctl.go:799: Error retrieving Unit(lb_feedcache_1002@.service) from Registry: unable to parse Unit in Registry at key /_coreos.com/fleet/job/lb_feedcache_1002@.service/object | |
^C | |
root@tree:/etcdctl --debug --endpoints=http://10.1.1.3:2379 get /_coreos.com/fleet/job/lb_feedcache_1002@.service/object | |
start to sync cluster using endpoints(http://10.1.1.3:2379) | |
cURL Command: curl -X GET http://10.1.1.3:2379/v2/members | |
got endpoints(http://10.1.1.4:2379,http://10.1.1.5:2379,http://10.1.1.3:2379) after sync | |
Cluster-Endpoints: http://10.1.1.4:2379, http://10.1.1.5:2379, http://10.1.1.3:2379 | |
cURL Command: curl -X GET http://10.1.1.4:2379/v2/keys/_coreos.com/fleet/job/lb_feedcache_1002@.service/object?quorum=false&recursive=false&sorted=false | |
{"Name":"lb_feedcache_1002@.service","UnitHash":[244,140,73,8,117,105,27,32,199,145,90,221,153,7,57,244,87,250,149,157]} | |
--- | |
And again, without restarting the cluster, subsequent attempts at executing the exact same command sometimes work and other times fail: | |
root@tree:/# fleetctl --debug --endpoint=http://10.1.1.3:2379 submit lb_feedcache_1002@.service | |
2016/07/25 17:15:10 DEBUG fleetctl.go:274: Defaulting to --driver=etcd as --endpoint appears to be etcd | |
2016/07/25 17:15:10 DEBUG fleetctl.go:494: Unit(lb_feedcache_1002@.service) found in local filesystem | |
2016/07/25 17:15:10 DEBUG fleetctl.go:583: Created Unit(lb_feedcache_1002@.service) in Registry | |
Unit lb_feedcache_1002@.service inactive | |
root@tree:/# fleetctl --debug --endpoint=http://10.1.1.3:2379 destroy lb_feedcache_1002@.service | |
2016/07/25 17:15:17 DEBUG fleetctl.go:274: Defaulting to --driver=etcd as --endpoint appears to be etcd | |
Destroyed lb_feedcache_1002@.service | |
root@tree:/# fleetctl --debug --endpoint=http://10.1.1.3:2379 submit lb_feedcache_1002@.service | |
2016/07/25 17:15:21 DEBUG fleetctl.go:274: Defaulting to --driver=etcd as --endpoint appears to be etcd | |
2016/07/25 17:15:21 DEBUG fleetctl.go:494: Unit(lb_feedcache_1002@.service) found in local filesystem | |
2016/07/25 17:15:21 DEBUG fleetctl.go:583: Created Unit(lb_feedcache_1002@.service) in Registry | |
Unit lb_feedcache_1002@.service inactive | |
root@tree:/# fleetctl --debug --endpoint=http://10.1.1.3:2379 destroy lb_feedcache_1002@.service | |
2016/07/25 17:15:25 DEBUG fleetctl.go:274: Defaulting to --driver=etcd as --endpoint appears to be etcd | |
Destroyed lb_feedcache_1002@.service | |
root@tree:/# fleetctl --debug --endpoint=http://10.1.1.3:2379 submit lb_feedcache_1002@.service | |
2016/07/25 17:15:29 DEBUG fleetctl.go:274: Defaulting to --driver=etcd as --endpoint appears to be etcd | |
2016/07/25 17:15:29 DEBUG fleetctl.go:494: Unit(lb_feedcache_1002@.service) found in local filesystem | |
2016/07/25 17:15:29 DEBUG fleetctl.go:583: Created Unit(lb_feedcache_1002@.service) in Registry | |
Unit lb_feedcache_1002@.service inactive | |
root@tree:/# fleetctl --debug --endpoint=http://10.1.1.3:2379 destroy lb_feedcache_1002@.service | |
2016/07/25 17:15:37 DEBUG fleetctl.go:274: Defaulting to --driver=etcd as --endpoint appears to be etcd | |
Destroyed lb_feedcache_1002@.service | |
root@tree:/# fleetctl --debug --endpoint=http://10.1.1.3:2379 submit lb_feedcache_1002@.service | |
2016/07/25 17:15:42 DEBUG fleetctl.go:274: Defaulting to --driver=etcd as --endpoint appears to be etcd | |
2016/07/25 17:15:42 DEBUG fleetctl.go:494: Unit(lb_feedcache_1002@.service) found in local filesystem | |
2016/07/25 17:15:42 DEBUG fleetctl.go:583: Created Unit(lb_feedcache_1002@.service) in Registry | |
2016/07/25 17:15:42 WARN job.go:268: No Unit found in Registry for Job(lb_feedcache_1002@.service) | |
2016/07/25 17:15:42 WARN fleetctl.go:799: Error retrieving Unit(lb_feedcache_1002@.service) from Registry: unable to parse Unit in Registry at key /_coreos.com/fleet/job/lb_feedcache_1002@.service/object | |
2016/07/25 17:15:43 WARN job.go:268: No Unit found in Registry for Job(lb_feedcache_1002@.service) | |
2016/07/25 17:15:43 WARN fleetctl.go:799: Error retrieving Unit(lb_feedcache_1002@.service) from Registry: unable to parse Unit in Registry at key /_coreos.com/fleet/job/lb_feedcache_1002@.service/object | |
2016/07/25 17:15:43 WARN job.go:268: No Unit found in Registry for Job(lb_feedcache_1002@.service) | |
2016/07/25 17:15:43 WARN fleetctl.go:799: Error retrieving Unit(lb_feedcache_1002@.service) from Registry: unable to parse Unit in Registry at key /_coreos.com/fleet/job/lb_feedcache_1002@.service/object | |
2016/07/25 17:15:44 WARN job.go:268: No Unit found in Registry for Job(lb_feedcache_1002@.service) | |
2016/07/25 17:15:44 WARN fleetctl.go:799: Error retrieving Unit(lb_feedcache_1002@.service) from Registry: unable to parse Unit in Registry at key /_coreos.com/fleet/job/lb_feedcache_1002@.service/object | |
2016/07/25 17:15:44 WARN job.go:268: No Unit found in Registry for Job(lb_feedcache_1002@.service) | |
2016/07/25 17:15:44 WARN fleetctl.go:799: Error retrieving Unit(lb_feedcache_1002@.service) from Registry: unable to parse Unit in Registry at key /_coreos.com/fleet/job/lb_feedcache_1002@.service/object | |
^C |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment