Skip to content

Instantly share code, notes, and snippets.

@waddles
Last active July 25, 2016 07:19
Show Gist options
  • Save waddles/0e121d46c0499eaaef9685eef06120f0 to your computer and use it in GitHub Desktop.
Save waddles/0e121d46c0499eaaef9685eef06120f0 to your computer and use it in GitHub Desktop.
Annoying etcd problem where sometimes fleet can submit a job, other times it fails with a 500 error from etcd
root@elf:~# /usr/local/sbin/etcd --version
etcd Version: 2.3.3
Git SHA: c41345d
Go Version: go1.6.2
Go OS/Arch: linux/amd64
root@tree:~# etcdctl --version
etcdctl version 2.3.3
root@tree:~# fleetctl --version
fleetctl version 0.11.5
root@tree:/# etcdctl member list
Error: dial tcp 127.0.0.1:4001: getsockopt: connection refused
root@tree:/# etcdctl member list
b5fa9b282334905d: name=dwarf peerURLs=http://10.1.1.4:2380 clientURLs=http://10.1.1.4:2379 isLeader=false
bdd359f77713beb4: name=elf peerURLs=http://10.1.1.3:2380 clientURLs=http://10.1.1.3:2379 isLeader=true
d45ce0c430449c65: name=halfling peerURLs=http://10.1.1.5:2380 clientURLs=http://10.1.1.5:2379 isLeader=false
root@tree:/# etcdctl --debug --endpoints=http://10.1.1.3:2379 get /_coreos.com/fleet/job/lb_feedcache_1002@.service/object
start to sync cluster using endpoints(http://10.1.1.3:2379)
cURL Command: curl -X GET http://10.1.1.3:2379/v2/members
got endpoints(http://10.1.1.3:2379,http://10.1.1.5:2379,http://10.1.1.4:2379) after sync
Cluster-Endpoints: http://10.1.1.3:2379, http://10.1.1.5:2379, http://10.1.1.4:2379
cURL Command: curl -X GET http://10.1.1.3:2379/v2/keys/_coreos.com/fleet/job/lb_feedcache_1002@.service/object?quorum=false&recursive=false&sorted=false
Error: 100: Key not found (/_coreos.com/fleet/job) [36]
root@tree:/# fleetctl --debug --endpoint=http://10.1.1.3:2379 submit lb_feedcache_1002@.service
2016/07/25 16:30:23 DEBUG fleetctl.go:274: Defaulting to --driver=etcd as --endpoint appears to be etcd
2016/07/25 16:30:23 DEBUG fleetctl.go:494: Unit(lb_feedcache_1002@.service) found in local filesystem
2016/07/25 16:30:24 DEBUG fleetctl.go:583: Created Unit(lb_feedcache_1002@.service) in Registry
Unit lb_feedcache_1002@.service inactive
root@tree:/# etcdctl --debug --endpoints=http://10.1.1.3:2379 get /_coreos.com/fleet/job/lb_feedcache_1002@.service/object
start to sync cluster using endpoints(http://10.1.1.3:2379)
cURL Command: curl -X GET http://10.1.1.3:2379/v2/members
got endpoints(http://10.1.1.4:2379,http://10.1.1.3:2379,http://10.1.1.5:2379) after sync
Cluster-Endpoints: http://10.1.1.4:2379, http://10.1.1.3:2379, http://10.1.1.5:2379
cURL Command: curl -X GET http://10.1.1.4:2379/v2/keys/_coreos.com/fleet/job/lb_feedcache_1002@.service/object?quorum=false&recursive=false&sorted=false
{"Name":"lb_feedcache_1002@.service","UnitHash":[104,66,102,42,116,131,7,19,58,59,79,32,97,29,252,89,238,74,88,141]}
---
Then on all 3 cluster peers:
root@elf:~# rm -fr /var/lib/etcd*
root@elf:~# /usr/local/sbin/etcd -initial-cluster-token token1 -initial-cluster elf=http://10.1.1.3:2380,dwarf=http://10.1.1.4:2380,halfling=http://10.1.1.5:2380 -initial-cluster-state new -initial-advertise-peer-urls http://10.1.1.3:2380 -advertise-client-urls http://10.1.1.3:2379 -listen-client-urls http://127.0.0.1:2379,http://10.1.1.3:2379 -listen-peer-urls http://127.0.0.1:2380,http://10.1.1.3:2380 -data-dir /var/lib/etcd2 -name elf -heartbeat-interval 600 -election-timeout 6000 -debug > /tmp/etcd.log 2>&1
---
root@tree:/# etcdctl member list
b5fa9b282334905d: name=dwarf peerURLs=http://10.1.1.4:2380 clientURLs=http://10.1.1.4:2379 isLeader=false
bdd359f77713beb4: name=elf peerURLs=http://10.1.1.3:2380 clientURLs=http://10.1.1.3:2379 isLeader=true
d45ce0c430449c65: name=halfling peerURLs=http://10.1.1.5:2380 clientURLs=http://10.1.1.5:2379 isLeader=false
root@tree:/# etcdctl --debug --endpoints=http://10.1.1.3:2379 get /_coreos.com/fleet/job/lb_feedcache_1002@.service/object
start to sync cluster using endpoints(http://10.1.1.3:2379)
cURL Command: curl -X GET http://10.1.1.3:2379/v2/members
got endpoints(http://10.1.1.4:2379,http://10.1.1.3:2379,http://10.1.1.5:2379) after sync
Cluster-Endpoints: http://10.1.1.4:2379, http://10.1.1.3:2379, http://10.1.1.5:2379
cURL Command: curl -X GET http://10.1.1.4:2379/v2/keys/_coreos.com/fleet/job/lb_feedcache_1002@.service/object?quorum=false&recursive=false&sorted=false
Error: 100: Key not found (/_coreos.com/fleet/job) [18]
root@tree:/# fleetctl --debug --endpoint=http://10.1.1.3:2379 submit lb_feedcache_1002@.service
2016/07/25 16:33:49 DEBUG fleetctl.go:274: Defaulting to --driver=etcd as --endpoint appears to be etcd
2016/07/25 16:33:49 DEBUG fleetctl.go:494: Unit(lb_feedcache_1002@.service) found in local filesystem
2016/07/25 16:33:49 DEBUG fleetctl.go:583: Created Unit(lb_feedcache_1002@.service) in Registry
2016/07/25 16:33:50 WARN job.go:268: No Unit found in Registry for Job(lb_feedcache_1002@.service)
2016/07/25 16:33:50 WARN fleetctl.go:799: Error retrieving Unit(lb_feedcache_1002@.service) from Registry: unable to parse Unit in Registry at key /_coreos.com/fleet/job/lb_feedcache_1002@.service/object
2016/07/25 16:33:50 WARN job.go:268: No Unit found in Registry for Job(lb_feedcache_1002@.service)
2016/07/25 16:33:50 WARN fleetctl.go:799: Error retrieving Unit(lb_feedcache_1002@.service) from Registry: unable to parse Unit in Registry at key /_coreos.com/fleet/job/lb_feedcache_1002@.service/object
2016/07/25 16:33:51 WARN job.go:268: No Unit found in Registry for Job(lb_feedcache_1002@.service)
2016/07/25 16:33:51 WARN fleetctl.go:799: Error retrieving Unit(lb_feedcache_1002@.service) from Registry: unable to parse Unit in Registry at key /_coreos.com/fleet/job/lb_feedcache_1002@.service/object
2016/07/25 16:33:51 WARN job.go:268: No Unit found in Registry for Job(lb_feedcache_1002@.service)
2016/07/25 16:33:51 WARN fleetctl.go:799: Error retrieving Unit(lb_feedcache_1002@.service) from Registry: unable to parse Unit in Registry at key /_coreos.com/fleet/job/lb_feedcache_1002@.service/object
2016/07/25 16:33:52 WARN job.go:268: No Unit found in Registry for Job(lb_feedcache_1002@.service)
2016/07/25 16:33:52 WARN fleetctl.go:799: Error retrieving Unit(lb_feedcache_1002@.service) from Registry: unable to parse Unit in Registry at key /_coreos.com/fleet/job/lb_feedcache_1002@.service/object
2016/07/25 16:33:52 WARN job.go:268: No Unit found in Registry for Job(lb_feedcache_1002@.service)
2016/07/25 16:33:52 WARN fleetctl.go:799: Error retrieving Unit(lb_feedcache_1002@.service) from Registry: unable to parse Unit in Registry at key /_coreos.com/fleet/job/lb_feedcache_1002@.service/object
2016/07/25 16:33:53 WARN job.go:268: No Unit found in Registry for Job(lb_feedcache_1002@.service)
2016/07/25 16:33:53 WARN fleetctl.go:799: Error retrieving Unit(lb_feedcache_1002@.service) from Registry: unable to parse Unit in Registry at key /_coreos.com/fleet/job/lb_feedcache_1002@.service/object
2016/07/25 16:33:53 WARN job.go:268: No Unit found in Registry for Job(lb_feedcache_1002@.service)
2016/07/25 16:33:53 WARN fleetctl.go:799: Error retrieving Unit(lb_feedcache_1002@.service) from Registry: unable to parse Unit in Registry at key /_coreos.com/fleet/job/lb_feedcache_1002@.service/object
2016/07/25 16:33:54 WARN job.go:268: No Unit found in Registry for Job(lb_feedcache_1002@.service)
2016/07/25 16:33:54 WARN fleetctl.go:799: Error retrieving Unit(lb_feedcache_1002@.service) from Registry: unable to parse Unit in Registry at key /_coreos.com/fleet/job/lb_feedcache_1002@.service/object
2016/07/25 16:33:54 WARN job.go:268: No Unit found in Registry for Job(lb_feedcache_1002@.service)
2016/07/25 16:33:54 WARN fleetctl.go:799: Error retrieving Unit(lb_feedcache_1002@.service) from Registry: unable to parse Unit in Registry at key /_coreos.com/fleet/job/lb_feedcache_1002@.service/object
2016/07/25 16:33:55 WARN job.go:268: No Unit found in Registry for Job(lb_feedcache_1002@.service)
2016/07/25 16:33:55 WARN fleetctl.go:799: Error retrieving Unit(lb_feedcache_1002@.service) from Registry: unable to parse Unit in Registry at key /_coreos.com/fleet/job/lb_feedcache_1002@.service/object
2016/07/25 16:33:55 WARN job.go:268: No Unit found in Registry for Job(lb_feedcache_1002@.service)
2016/07/25 16:33:55 WARN fleetctl.go:799: Error retrieving Unit(lb_feedcache_1002@.service) from Registry: unable to parse Unit in Registry at key /_coreos.com/fleet/job/lb_feedcache_1002@.service/object
2016/07/25 16:33:56 WARN job.go:268: No Unit found in Registry for Job(lb_feedcache_1002@.service)
2016/07/25 16:33:56 WARN fleetctl.go:799: Error retrieving Unit(lb_feedcache_1002@.service) from Registry: unable to parse Unit in Registry at key /_coreos.com/fleet/job/lb_feedcache_1002@.service/object
2016/07/25 16:33:56 WARN job.go:268: No Unit found in Registry for Job(lb_feedcache_1002@.service)
2016/07/25 16:33:56 WARN fleetctl.go:799: Error retrieving Unit(lb_feedcache_1002@.service) from Registry: unable to parse Unit in Registry at key /_coreos.com/fleet/job/lb_feedcache_1002@.service/object
2016/07/25 16:33:57 WARN job.go:268: No Unit found in Registry for Job(lb_feedcache_1002@.service)
2016/07/25 16:33:57 WARN fleetctl.go:799: Error retrieving Unit(lb_feedcache_1002@.service) from Registry: unable to parse Unit in Registry at key /_coreos.com/fleet/job/lb_feedcache_1002@.service/object
^C
root@tree:/etcdctl --debug --endpoints=http://10.1.1.3:2379 get /_coreos.com/fleet/job/lb_feedcache_1002@.service/object
start to sync cluster using endpoints(http://10.1.1.3:2379)
cURL Command: curl -X GET http://10.1.1.3:2379/v2/members
got endpoints(http://10.1.1.4:2379,http://10.1.1.5:2379,http://10.1.1.3:2379) after sync
Cluster-Endpoints: http://10.1.1.4:2379, http://10.1.1.5:2379, http://10.1.1.3:2379
cURL Command: curl -X GET http://10.1.1.4:2379/v2/keys/_coreos.com/fleet/job/lb_feedcache_1002@.service/object?quorum=false&recursive=false&sorted=false
{"Name":"lb_feedcache_1002@.service","UnitHash":[244,140,73,8,117,105,27,32,199,145,90,221,153,7,57,244,87,250,149,157]}
---
And again, without restarting the cluster, subsequent attempts at executing the exact same command sometimes work and other times fail:
root@tree:/# fleetctl --debug --endpoint=http://10.1.1.3:2379 submit lb_feedcache_1002@.service
2016/07/25 17:15:10 DEBUG fleetctl.go:274: Defaulting to --driver=etcd as --endpoint appears to be etcd
2016/07/25 17:15:10 DEBUG fleetctl.go:494: Unit(lb_feedcache_1002@.service) found in local filesystem
2016/07/25 17:15:10 DEBUG fleetctl.go:583: Created Unit(lb_feedcache_1002@.service) in Registry
Unit lb_feedcache_1002@.service inactive
root@tree:/# fleetctl --debug --endpoint=http://10.1.1.3:2379 destroy lb_feedcache_1002@.service
2016/07/25 17:15:17 DEBUG fleetctl.go:274: Defaulting to --driver=etcd as --endpoint appears to be etcd
Destroyed lb_feedcache_1002@.service
root@tree:/# fleetctl --debug --endpoint=http://10.1.1.3:2379 submit lb_feedcache_1002@.service
2016/07/25 17:15:21 DEBUG fleetctl.go:274: Defaulting to --driver=etcd as --endpoint appears to be etcd
2016/07/25 17:15:21 DEBUG fleetctl.go:494: Unit(lb_feedcache_1002@.service) found in local filesystem
2016/07/25 17:15:21 DEBUG fleetctl.go:583: Created Unit(lb_feedcache_1002@.service) in Registry
Unit lb_feedcache_1002@.service inactive
root@tree:/# fleetctl --debug --endpoint=http://10.1.1.3:2379 destroy lb_feedcache_1002@.service
2016/07/25 17:15:25 DEBUG fleetctl.go:274: Defaulting to --driver=etcd as --endpoint appears to be etcd
Destroyed lb_feedcache_1002@.service
root@tree:/# fleetctl --debug --endpoint=http://10.1.1.3:2379 submit lb_feedcache_1002@.service
2016/07/25 17:15:29 DEBUG fleetctl.go:274: Defaulting to --driver=etcd as --endpoint appears to be etcd
2016/07/25 17:15:29 DEBUG fleetctl.go:494: Unit(lb_feedcache_1002@.service) found in local filesystem
2016/07/25 17:15:29 DEBUG fleetctl.go:583: Created Unit(lb_feedcache_1002@.service) in Registry
Unit lb_feedcache_1002@.service inactive
root@tree:/# fleetctl --debug --endpoint=http://10.1.1.3:2379 destroy lb_feedcache_1002@.service
2016/07/25 17:15:37 DEBUG fleetctl.go:274: Defaulting to --driver=etcd as --endpoint appears to be etcd
Destroyed lb_feedcache_1002@.service
root@tree:/# fleetctl --debug --endpoint=http://10.1.1.3:2379 submit lb_feedcache_1002@.service
2016/07/25 17:15:42 DEBUG fleetctl.go:274: Defaulting to --driver=etcd as --endpoint appears to be etcd
2016/07/25 17:15:42 DEBUG fleetctl.go:494: Unit(lb_feedcache_1002@.service) found in local filesystem
2016/07/25 17:15:42 DEBUG fleetctl.go:583: Created Unit(lb_feedcache_1002@.service) in Registry
2016/07/25 17:15:42 WARN job.go:268: No Unit found in Registry for Job(lb_feedcache_1002@.service)
2016/07/25 17:15:42 WARN fleetctl.go:799: Error retrieving Unit(lb_feedcache_1002@.service) from Registry: unable to parse Unit in Registry at key /_coreos.com/fleet/job/lb_feedcache_1002@.service/object
2016/07/25 17:15:43 WARN job.go:268: No Unit found in Registry for Job(lb_feedcache_1002@.service)
2016/07/25 17:15:43 WARN fleetctl.go:799: Error retrieving Unit(lb_feedcache_1002@.service) from Registry: unable to parse Unit in Registry at key /_coreos.com/fleet/job/lb_feedcache_1002@.service/object
2016/07/25 17:15:43 WARN job.go:268: No Unit found in Registry for Job(lb_feedcache_1002@.service)
2016/07/25 17:15:43 WARN fleetctl.go:799: Error retrieving Unit(lb_feedcache_1002@.service) from Registry: unable to parse Unit in Registry at key /_coreos.com/fleet/job/lb_feedcache_1002@.service/object
2016/07/25 17:15:44 WARN job.go:268: No Unit found in Registry for Job(lb_feedcache_1002@.service)
2016/07/25 17:15:44 WARN fleetctl.go:799: Error retrieving Unit(lb_feedcache_1002@.service) from Registry: unable to parse Unit in Registry at key /_coreos.com/fleet/job/lb_feedcache_1002@.service/object
2016/07/25 17:15:44 WARN job.go:268: No Unit found in Registry for Job(lb_feedcache_1002@.service)
2016/07/25 17:15:44 WARN fleetctl.go:799: Error retrieving Unit(lb_feedcache_1002@.service) from Registry: unable to parse Unit in Registry at key /_coreos.com/fleet/job/lb_feedcache_1002@.service/object
^C
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment