Version: OCP 3.7, openshift-ansible-3.7.22-1-9-g56970a0 Environment: Amazon AWS cluster Scenario: 1 master&node + 2 nodes; glusterfs
TASK [openshift_storage_glusterfs : Verify heketi service] **************************************************************************************************************************************************************
fatal: [ec2-54-237-234-66.compute-1.amazonaws.com]: FAILED! => {"changed": false, "cmd": ["oc", "rsh", "--namespace=default", "deploy-heketi-storage-1-vkfwm", "heketi-cli", "-s", "http://localhost:8080", "--user", "admin", "--secret", "4Jl50v50d9BtfTbxXI1LWeikIH9dfWGe559rZlq2Gz8=", "cluster", "list"], "delta": "0:02:07.485073", "end": "2018-01-25 05:26:51.447002", "msg": "non-zero return code", "rc": 1, "start": "2018-01-25 05:24:43.961929", "stderr": "Error from server: error dialing backend: dial tcp 172.18.12.254:10250: getsockopt: connection timed out", "stderr_lines": ["Error from server: error dialing backend: dial tcp 172.18.12.254:10250: getsockopt: connection timed out"], "stdout": "", "stdout_lines": []}
to retry, use: --limit @/home/ec2-user/sapvora/openshift-ansible/playbooks/byo/config.retry
Found out that the master node cannot talk to kubelets at all:
# executed from the master node
$ curl -k https://172.18.12.254:10250/healthz
curl: (7) Failed connect to 172.18.12.254:10250; Connection timed out
All the EC2 instances used the same VPC network. The problem was in the security group (public-http
) with allowed inbound traffic:
Type | Protocol | Port Range | Source | Description |
---|---|---|---|---|
SSH | TCP | 22 | 0.0.0.0/0 | |
HTTP | TCP | 80 | 0.0.0.0/0 | |
HTTPS | TCP | 443 | 0.0.0.0/0 | |
Custom TCP Rule | TCP | 8000 | 0.0.0.0/0 | |
Custom TCP Rule | TCP | 8443 | 0.0.0.0/0 |
After adding default
security group with all inbound traffic allowed, the nodes could talk to each other again.