Skip to content

Instantly share code, notes, and snippets.

@olvesh
Last active September 4, 2022 02:25
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save olvesh/c63b3490a885b4f2054847abe9c41bd2 to your computer and use it in GitHub Desktop.
Save olvesh/c63b3490a885b4f2054847abe9c41bd2 to your computer and use it in GitHub Desktop.
Kubernetes and Kops migrate from public to private topology

From public to private topology

Run kops get cluster -o yaml --full > cluster-full.yaml for reference and backup

Bastion and weave networking

kops edit cluster:

  • change networking to
    spec: 
      networking:
        weave: 
          mtu: 8912
  • change topology to match:
spec:
  #[...]
  topology:
    bastion:
      bastionPublicName: bastion.<cluster-name>
    dns:
      type: Public
    masters: private
    nodes: private
  • save (vim :wq)
  • kops update cluster

This will also update a few other fields, like networkPluginName and configureCloudRoutes

Subnets and instancegroups

Run kops get cluster -o yaml --full > cluster-full-v1.yaml for reference.

kops edit cluster, subnets type Public -> Private. Add utility net for each subnet, with cidr to match networkCIDR and make sure they dont overlap

spec: 
  #[...]
  subnets:
  - cidr: 172.20.32.0/19
    name: eu-west-1a
    type: Private
    zone: eu-west-1a
  - cidr: 172.20.64.0/19
    name: eu-west-1b
    type: Private
    zone: eu-west-1b
  - cidr: 172.20.96.0/19
    name: eu-west-1c
    type: Private
    zone: eu-west-1c
  - cidr: 172.20.0.0/22
    name: utility-eu-west-1a
    type: Utility
    zone: eu-west-1a
  - cidr: 172.20.4.0/22
    name: utility-eu-west-1b
    type: Utility
    zone: eu-west-1b
  - cidr: 172.20.8.0/22
    name: utility-eu-west-1c
    type: Utility
    zone: eu-west-1c    

Also set the api to loadbalancer instead of dns:

spec:
  #[...]
  api:
    loadBalancer:
      type: Public
  • kops get ig
  • for each ig: kubectl edit ig <ig-name> and remove line with associatePublicIp: true
  • kops update cluster - check if output looks sane.
  • Run kops get cluster -o yaml --full > cluster-full-v2.yaml for reference.
  • have lunch, coffee break or whatever work for you. If things break in next step you have to be able to sit for while

Executing:

# kops update cluster --yes
I0614 12:09:48.619426   23371 executor.go:91] Tasks: 0 done / 124 total; 40 can run
I0614 12:09:49.546706   23371 executor.go:91] Tasks: 40 done / 124 total; 20 can run
I0614 12:09:50.282524   23371 executor.go:91] Tasks: 60 done / 124 total; 54 can run
I0614 12:09:53.494105   23371 executor.go:91] Tasks: 114 done / 124 total; 7 can run
I0614 12:09:54.100612   23371 executor.go:91] Tasks: 121 done / 124 total; 3 can run
I0614 12:09:54.166540   23371 natgateway.go:266] Waiting for NAT Gateway "nat-0b97d67ba07694ea1" to be available (this often takes about 5 minutes)
I0614 12:09:54.167893   23371 natgateway.go:266] Waiting for NAT Gateway "nat-03a9ed7ac5518deb3" to be available (this often takes about 5 minutes)
I0614 12:09:54.236704   23371 natgateway.go:266] Waiting for NAT Gateway "nat-09eb41a0c3c06d1aa" to be available (this often takes about 5 minutes)
I0614 12:12:10.804405   23371 executor.go:91] Tasks: 124 done / 124 total; 0 can run
I0614 12:12:10.807009   23371 dns.go:152] Pre-creating DNS records
I0614 12:12:12.482297   23371 update_cluster.go:229] Exporting kubecfg for cluster
Kops has set your kubectl context to <cluster-name> 

When doing rolling update you'll probably see this:

kops rolling-update cluster
Using cluster from kubectl context: <clustername>

Unable to reach the kubernetes API.
Use --cloudonly to do a rolling-update without confirming progress with the k8s API


error listing nodes in cluster: Get https://api.<clustername>/api/v1/nodes: dial tcp <api ip addr>:443: i/o timeout

Next step is to do a rolling update with --cloudonly as we can' reach our API now. You will get no draining or other safe procedures this way. Any work that needs to be scaled down properly must be done before these operation starts

#  kops rolling-update cluster --cloudonly --yes
Using cluster from kubectl context: <clustername>
Using cluster from kubectl context: <clustername>

NAME                    STATUS          NEEDUPDATE      READY   MIN     MAX
master-eu-west-1a       NeedsUpdate     1               0       1       1
master-eu-west-1b       NeedsUpdate     1               0       1       1
master-eu-west-1c       NeedsUpdate     1               0       1       1
nodes                   NeedsUpdate     3               0       1       6
W0614 12:16:24.031505   23427 rollingupdate_cluster.go:372] Not draining cluster nodes as 'cloudonly' flag is set.
I0614 12:16:24.031525   23427 rollingupdate_cluster.go:460] Stopping instance "i-0afdf347d38de97e3", in AWS ASG "master-eu-west-1c.masters.<clustername>".
W0614 12:21:24.268966   23427 rollingupdate_cluster.go:401] Not validating cluster as cloudonly flag is set.
W0614 12:21:24.269025   23427 rollingupdate_cluster.go:372] Not draining cluster nodes as 'cloudonly' flag is set.
I0614 12:21:24.269035   23427 rollingupdate_cluster.go:460] Stopping instance "i-07a4c3d75c3c959bd", in AWS ASG "master-eu-west-1a.masters.<clustername>".
W0614 12:26:24.780248   23427 rollingupdate_cluster.go:401] Not validating cluster as cloudonly flag is set.
W0614 12:26:24.780341   23427 rollingupdate_cluster.go:372] Not draining cluster nodes as 'cloudonly' flag is set.
I0614 12:26:24.780366   23427 rollingupdate_cluster.go:460] Stopping instance "i-0f404c6d5b08e0ae1", in AWS ASG "master-eu-west-1b.masters.<clustername>".
W0614 12:31:25.351674   23427 rollingupdate_cluster.go:401] Not validating cluster as cloudonly flag is set.
W0614 12:31:25.351850   23427 rollingupdate_cluster.go:372] Not draining cluster nodes as 'cloudonly' flag is set.
I0614 12:31:25.351883   23427 rollingupdate_cluster.go:460] Stopping instance "i-04110186afe01a445", in AWS ASG "nodes.<clustername>".
W0614 12:33:25.768280   23427 rollingupdate_cluster.go:401] Not validating cluster as cloudonly flag is set.
W0614 12:33:25.768321   23427 rollingupdate_cluster.go:372] Not draining cluster nodes as 'cloudonly' flag is set.
I0614 12:33:25.768332   23427 rollingupdate_cluster.go:460] Stopping instance "i-045f4589362b37ec0", in AWS ASG "nodes.<clustername>".
W0614 12:35:26.179288   23427 rollingupdate_cluster.go:401] Not validating cluster as cloudonly flag is set.
W0614 12:35:26.179369   23427 rollingupdate_cluster.go:372] Not draining cluster nodes as 'cloudonly' flag is set.
I0614 12:35:26.179398   23427 rollingupdate_cluster.go:460] Stopping instance "i-0752c6bbfaf29dc8f", in AWS ASG "nodes.<clustername>".
W0614 12:37:26.694596   23427 rollingupdate_cluster.go:401] Not validating cluster as cloudonly flag is set.
I0614 12:37:26.697947   23427 rollingupdate_cluster.go:241] Rolling update completed!

Finisshing notes

  • There is always things I forget, or that could have been done differently. I forgot to create the Bastion instancegroup:
    kops create ig --name=<cluster-name> bastions --role Bastion --subnet utility-eu-west-1a,utility-eu-west-1b,utility-eu-west-1c
    
    
    Run this command, save the output (if it looks ok) and run kops update cluster --yes. Shouldn't need to do a rolling update at this point.
  • And kube2iam needs to change config from cbr0 (iface for kubenet) to weave (iface for weave)

Also, your milage may vary - your cluster might have different peculiarities this guide won't cover. Also it is important to have DNS/Route53 set up properly so that kops/kubernetes can update it.

@olvesh
Copy link
Author

olvesh commented Jun 14, 2017

I am having some problems after this - my ELB only sees one active instance, and it does not work. Suspect some network/weave issue, but not sure yet. Internally the cluster looks fine.

@olvesh
Copy link
Author

olvesh commented Jun 14, 2017

I think I was hit by
weaveworks/weave#3011 / weaveworks/weave#2997

Seeing lots of

WARN: 2017/06/14 12:06:33.360353 TCP connection from 172.20.64.185:36076 to 10.37.192.17:80 blocked by Weave NPC.
WARN: 2017/06/14 12:06:33.360412 TCP connection from 172.20.33.84:12631 to 10.37.192.17:80 blocked by Weave NPC.
WARN: 2017/06/14 12:06:33.360443 TCP connection from 172.20.124.155:25980 to 10.37.192.17:80 blocked by Weave NPC.

@olvesh
Copy link
Author

olvesh commented Jun 14, 2017

Tried downgrading to weave 1.9.4, but still not working.

@olvesh
Copy link
Author

olvesh commented Jun 14, 2017

Problem was that the old ELBs where using the old subnets. They just had to be moved over to the new.

@Juanchimienti
Copy link

small typo in:
for each ig: kubectl edit ig and remove line with associatePublicIp: true
should be:
kops edit ig

@alexcfpho
Copy link

Just a note it's probably a good thing you do the bastion at the end. I ran into issues with the bastion instancegroup trying to find the Utility Subnet anyway since it wasn't created until going into a Private topology. Sort of chicken and egg situation.

@vjwilson1987
Copy link

Did the above steps, but kubectl command times out now and cannot even connect bastion host ssh using its ELB.

@vjwilson1987
Copy link

ok, the problem was

Changed from sshaccess restriction:

  sshAccess:
  - 172.16.0.0/12

To

  sshAccess:
  - 0.0.0.0/0
  - ::/0

Or the VPC CIDR 172.23.0.0/16 itself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment