Upgrading Kubernetes

Video Walkthrough

To see a live demo of this writeup, see the video below:

Kubeconfig

In order to edit the control plane, we will need a working kubectl config. If you don't already have one, you can get one by running:

talosctl --nodes <master node> kubeconfig

Automated Kubernetes Upgrade

To upgrade from Kubernetes v1.19.4 to v1.20.1 run:

$ talosctl --nodes <master node> upgrade-k8s --from 1.19.4 --to 1.20.1
patched kube-apiserver secrets for "service-account.key"
updating pod-checkpointer grace period to "0m"
sleeping 5m0s to let the pod-checkpointer self-checkpoint be updated
temporarily taking "kube-apiserver" out of pod-checkpointer control
updating daemonset "kube-apiserver" to version "1.20.1"
updating daemonset "kube-controller-manager" to version "1.20.1"
updating daemonset "kube-scheduler" to version "1.20.1"
updating daemonset "kube-proxy" to version "1.20.1"
updating pod-checkpointer grace period to "5m0s"

Manual Kubernetes Upgrade

Kubernetes can be upgraded manually as well by following the steps outlined below. They are equivalent to the steps performed by the talosctl upgrade-k8s command.

Patching kube-apiserver Secrets

Copy secret value service-account.key from the secret kube-controller-manager in kube-system namespace to the secret kube-apiserver.

After these changes, kube-apiserver secret should contain the following entries:

Data
====
service-account.key:
apiserver.key:
ca.crt:
front-proxy-client.crt:
apiserver-kubelet-client.crt:
encryptionconfig.yaml:
etcd-client.crt:
front-proxy-client.key:
service-account.pub:
apiserver.crt:
auditpolicy.yaml:
etcd-client.key:
apiserver-kubelet-client.key:
front-proxy-ca.crt:
etcd-client-ca.crt:

pod-checkpointer

Talos runs pod-checkpointer component which helps to recover control plane components (specifically, API server) if control plane is not healthy.

However, the way checkpoints interact with API server upgrade may make an upgrade take a lot longer due to a race condition on API server listen port.

In order to speed up upgrades, first lower pod-checkpointer grace period to zero (kubectl -n kube-system edit daemonset pod-checkpointer), change:

kind: DaemonSet
...
spec:
  ...
  template:
    ...
    spec:
      containers:
      - name: pod-checkpointer
        command:
        ...
        - --checkpoint-grace-period=5m0s

to:

kind: DaemonSet
...
spec:
  ...
  template:
    ...
    spec:
      containers:
      - name: pod-checkpointer
        command:
        ...
        - --checkpoint-grace-period=0s

Wait for 5 minutes to let pod-checkpointer update self-checkpoint to the new grace period.

API Server

In the API server's DaemonSet, change:

kind: DaemonSet
...
spec:
  ...
  template:
    ...
    spec:
      containers:
        - name: kube-apiserver
          image: k8s.gcr.io/kube-apiserver:v1.19.4
          command:
            - /go-runner
            - /usr/local/bin/kube-apiserver
      tolerations:
        - ...

to:

kind: DaemonSet
...
spec:
  ...
  template:
    ...
    spec:
      containers:
        - name: kube-apiserver
          image: k8s.gcr.io/kube-apiserver:v1.20.1
          command:
            - /go-runner
            - /usr/local/bin/kube-apiserver
            - ...
            - --api-audiences=<control plane endpoint>
            - --service-account-issuer=<control plane endpoint>
            - --service-account-signing-key-file=/etc/kubernetes/secrets/service-account.key
      tolerations:
        - ...
        - key: node-role.kubernetes.io/control-plane
          operator: Exists
          effect: NoSchedule

Summary of the changes:

  • update image version
  • add new toleration
  • add three new flags (replace <control plane endpoint> with the actual endpoint of the cluster, e.g. https://10.5.0.1:6443)

To edit the DaemonSet, run:

kubectl edit daemonsets -n kube-system kube-apiserver

Controller Manager

In the controller manager's DaemonSet, change:

kind: DaemonSet
...
spec:
  ...
  template:
    ...
    spec:
      containers:
        - name: kube-controller-manager
          image: k8s.gcr.io/kube-controller-manager:v1.19.4
      tolerations:
        - ...

to:

kind: DaemonSet
...
spec:
  ...
  template:
    ...
    spec:
      containers:
        - name: kube-controller-manager
          image: k8s.gcr.io/kube-controller-manager:v1.20.1
      tolerations:
        - ...
        - key: node-role.kubernetes.io/control-plane
          operator: Exists
          effect: NoSchedule

To edit the DaemonSet, run:

kubectl edit daemonsets -n kube-system kube-controller-manager

Scheduler

In the scheduler's DaemonSet, change:

kind: DaemonSet
...
spec:
  ...
  template:
    ...
    spec:
      containers:
        - name: kube-scheduler
          image: k8s.gcr.io/kube-scheduler:v1.19.4
      tolerations:
        - ...

to:

kind: DaemonSet
...
spec:
  ...
  template:
    ...
    spec:
      containers:
        - name: kube-sceduler
          image: k8s.gcr.io/kube-scheduler:v1.20.1
      tolerations:
        - ...
        - key: node-role.kubernetes.io/control-plane
          operator: Exists
          effect: NoSchedule

To edit the DaemonSet, run:

kubectl edit daemonsets -n kube-system kube-scheduler

Proxy

In the proxy's DaemonSet, change:

kind: DaemonSet
...
spec:
  ...
  template:
    ...
    spec:
      containers:
        - name: kube-proxy
          image: k8s.gcr.io/kube-proxy:v1.19.4
      tolerations:
        - ...

to:

kind: DaemonSet
...
spec:
  ...
  template:
    ...
    spec:
      containers:
        - name: kube-proxy
          image: k8s.gcr.io/kube-proxy:v1.20.1
      tolerations:
        - ...
        - key: node-role.kubernetes.io/control-plane
          operator: Exists
          effect: NoSchedule

To edit the DaemonSet, run:

kubectl edit daemonsets -n kube-system kube-proxy

Restoring pod-checkpointer

Restore grace period of 5 minutes (kubectl -n kube-system edit daemonset pod-checkpointer) and add new toleration, change:

kind: DaemonSet
...
spec:
  ...
  template:
    ...
    spec:
      containers:
      - name: pod-checkpointer
        command:
        ...
        - --checkpoint-grace-period=0s
      tolerations:
        - ...

to:

kind: DaemonSet
...
spec:
  ...
  template:
    ...
    spec:
      containers:
      - name: pod-checkpointer
        command:
        ...
        - --checkpoint-grace-period=5m0s
      tolerations:
        - ...
        - key: node-role.kubernetes.io/control-plane
          operator: Exists
          effect: NoSchedule

Kubelet

The Talos team now maintains an image for the kubelet that should be used starting with Kubernetes 1.20. The image for this release is ghcr.io/talos-systems/kubelet:v1.20.1. To explicitly set the image, we can use the official documentation. For example:

machine:
  ...
  kubelet:
    image: ghcr.io/talos-systems/kubelet:v1.20.1