This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Network

Managing the Kubernetes cluster networking

1 - Deploying Cilium CNI

In this guide you will learn how to set up Cilium CNI on Talos.

Cilium can be installed either via the cilium cli or using helm.

This documentation will outline installing Cilium CNI v1.14.0 on Talos in six different ways. Adhering to Talos principles we’ll deploy Cilium with IPAM mode set to Kubernetes, and using the cgroupv2 and bpffs mount that talos already provides. As Talos does not allow loading kernel modules by Kubernetes workloads, SYS_MODULE capability needs to be dropped from the Cilium default set of values, this override can be seen in the helm/cilium cli install commands. Each method can either install Cilium using kube proxy (default) or without: Kubernetes Without kube-proxy

In this guide we assume that KubePrism is enabled and configured to use the port 7445.

Machine config preparation

When generating the machine config for a node set the CNI to none. For example using a config patch:

Create a patch.yaml file with the following contents:

cluster:
  network:
    cni:
      name: none
talosctl gen config \
    my-cluster https://mycluster.local:6443 \
    --config-patch @patch.yaml

Or if you want to deploy Cilium without kube-proxy, you also need to disable kube proxy:

Create a patch.yaml file with the following contents:

cluster:
  network:
    cni:
      name: none
  proxy:
    disabled: true
talosctl gen config \
    my-cluster https://mycluster.local:6443 \
    --config-patch @patch.yaml

Installation using Cilium CLI

Note: It is recommended to template the cilium manifest using helm and use it as part of Talos machine config, but if you want to install Cilium using the Cilium CLI, you can follow the steps below.

Install the Cilium CLI following the steps here.

With kube-proxy

cilium install \
    --helm-set=ipam.mode=kubernetes \
    --helm-set=kubeProxyReplacement=disabled \
    --helm-set=securityContext.capabilities.ciliumAgent="{CHOWN,KILL,NET_ADMIN,NET_RAW,IPC_LOCK,SYS_ADMIN,SYS_RESOURCE,DAC_OVERRIDE,FOWNER,SETGID,SETUID}" \
    --helm-set=securityContext.capabilities.cleanCiliumState="{NET_ADMIN,SYS_ADMIN,SYS_RESOURCE}" \
    --helm-set=cgroup.autoMount.enabled=false \
    --helm-set=cgroup.hostRoot=/sys/fs/cgroup

Without kube-proxy

cilium install \
    --helm-set=ipam.mode=kubernetes \
    --helm-set=kubeProxyReplacement=true \
    --helm-set=securityContext.capabilities.ciliumAgent="{CHOWN,KILL,NET_ADMIN,NET_RAW,IPC_LOCK,SYS_ADMIN,SYS_RESOURCE,DAC_OVERRIDE,FOWNER,SETGID,SETUID}" \
    --helm-set=securityContext.capabilities.cleanCiliumState="{NET_ADMIN,SYS_ADMIN,SYS_RESOURCE}" \
    --helm-set=cgroup.autoMount.enabled=false \
    --helm-set=cgroup.hostRoot=/sys/fs/cgroup \
    --helm-set=k8sServiceHost=localhost \
    --helm-set=k8sServicePort=7445

Installation using Helm

Refer to Installing with Helm for more information.

First we’ll need to add the helm repo for Cilium.

helm repo add cilium https://helm.cilium.io/
helm repo update

Method 1: Helm install

After applying the machine config and bootstrapping Talos will appear to hang on phase 18/19 with the message: retrying error: node not ready. This happens because nodes in Kubernetes are only marked as ready once the CNI is up. As there is no CNI defined, the boot process is pending and will reboot the node to retry after 10 minutes, this is expected behavior.

During this window you can install Cilium manually by running the following:

helm install \
    cilium \
    cilium/cilium \
    --version 1.14.0 \
    --namespace kube-system \
    --set ipam.mode=kubernetes \
    --set=kubeProxyReplacement=disabled \
    --set=securityContext.capabilities.ciliumAgent="{CHOWN,KILL,NET_ADMIN,NET_RAW,IPC_LOCK,SYS_ADMIN,SYS_RESOURCE,DAC_OVERRIDE,FOWNER,SETGID,SETUID}" \
    --set=securityContext.capabilities.cleanCiliumState="{NET_ADMIN,SYS_ADMIN,SYS_RESOURCE}" \
    --set=cgroup.autoMount.enabled=false \
    --set=cgroup.hostRoot=/sys/fs/cgroup

Or if you want to deploy Cilium without kube-proxy, also set some extra paramaters:

helm install \
    cilium \
    cilium/cilium \
    --version 1.14.0 \
    --namespace kube-system \
    --set ipam.mode=kubernetes \
    --set=kubeProxyReplacement=true \
    --set=securityContext.capabilities.ciliumAgent="{CHOWN,KILL,NET_ADMIN,NET_RAW,IPC_LOCK,SYS_ADMIN,SYS_RESOURCE,DAC_OVERRIDE,FOWNER,SETGID,SETUID}" \
    --set=securityContext.capabilities.cleanCiliumState="{NET_ADMIN,SYS_ADMIN,SYS_RESOURCE}" \
    --set=cgroup.autoMount.enabled=false \
    --set=cgroup.hostRoot=/sys/fs/cgroup \
    --set=k8sServiceHost=localhost \
    --set=k8sServicePort=7445

After Cilium is installed the boot process should continue and complete successfully.

Method 2: Helm manifests install

Instead of directly installing Cilium you can instead first generate the manifest and then apply it:

helm template \
    cilium \
    cilium/cilium \
    --version 1.14.0 \
    --namespace kube-system \
    --set ipam.mode=kubernetes \
    --set=kubeProxyReplacement=disabled \
    --set=securityContext.capabilities.ciliumAgent="{CHOWN,KILL,NET_ADMIN,NET_RAW,IPC_LOCK,SYS_ADMIN,SYS_RESOURCE,DAC_OVERRIDE,FOWNER,SETGID,SETUID}" \
    --set=securityContext.capabilities.cleanCiliumState="{NET_ADMIN,SYS_ADMIN,SYS_RESOURCE}" \
    --set=cgroup.autoMount.enabled=false \
    --set=cgroup.hostRoot=/sys/fs/cgroup > cilium.yaml

kubectl apply -f cilium.yaml

Without kube-proxy:

helm template \
    cilium \
    cilium/cilium \
    --version 1.14.0 \
    --namespace kube-system \
    --set ipam.mode=kubernetes \
    --set=kubeProxyReplacement=true \
    --set=securityContext.capabilities.ciliumAgent="{CHOWN,KILL,NET_ADMIN,NET_RAW,IPC_LOCK,SYS_ADMIN,SYS_RESOURCE,DAC_OVERRIDE,FOWNER,SETGID,SETUID}" \
    --set=securityContext.capabilities.cleanCiliumState="{NET_ADMIN,SYS_ADMIN,SYS_RESOURCE}" \
    --set=cgroup.autoMount.enabled=false \
    --set=cgroup.hostRoot=/sys/fs/cgroup \
    --set=k8sServiceHost=localhost \
    --set=k8sServicePort=7445 > cilium.yaml

kubectl apply -f cilium.yaml

Method 3: Helm manifests hosted install

After generating cilium.yaml using helm template, instead of applying this manifest directly during the Talos boot window (before the reboot timeout). You can also host this file somewhere and patch the machine config to apply this manifest automatically during bootstrap. To do this patch your machine configuration to include this config instead of the above:

Create a patch.yaml file with the following contents:

cluster:
  network:
    cni:
      name: custom
      urls:
        - https://server.yourdomain.tld/some/path/cilium.yaml
talosctl gen config \
    my-cluster https://mycluster.local:6443 \
    --config-patch @patch.yaml

However, beware of the fact that the helm generated Cilium manifest contains sensitive key material. As such you should definitely not host this somewhere publicly accessible.

Method 4: Helm manifests inline install

A more secure option would be to include the helm template output manifest inside the machine configuration. The machine config should be generated with CNI set to none

Create a patch.yaml file with the following contents:

cluster:
  network:
    cni:
      name: none
talosctl gen config \
    my-cluster https://mycluster.local:6443 \
    --config-patch @patch.yaml

if deploying Cilium with kube-proxy disabled, you can also include the following:

Create a patch.yaml file with the following contents:

cluster:
  network:
    cni:
      name: none
  proxy:
    disabled: true
talosctl gen config \
    my-cluster https://mycluster.local:6443 \
    --config-patch @patch.yaml

To do so patch this into your machine configuration:

cluster:
  inlineManifests:
    - name: cilium
      contents: |
        --
        # Source: cilium/templates/cilium-agent/serviceaccount.yaml
        apiVersion: v1
        kind: ServiceAccount
        metadata:
          name: "cilium"
          namespace: kube-system
        ---
        # Source: cilium/templates/cilium-operator/serviceaccount.yaml
        apiVersion: v1
        kind: ServiceAccount
        -> Your cilium.yaml file will be pretty long....        

This will install the Cilium manifests at just the right time during bootstrap.

Beware though:

  • Changing the namespace when templating with Helm does not generate a manifest containing the yaml to create that namespace. As the inline manifest is processed from top to bottom make sure to manually put the namespace yaml at the start of the inline manifest.
  • Only add the Cilium inline manifest to the control plane nodes machine configuration.
  • Make sure all control plane nodes have an identical configuration.
  • If you delete any of the generated resources they will be restored whenever a control plane node reboots.
  • As a safety messure Talos only creates missing resources from inline manifests, it never deletes or updates anything.
  • If you need to update a manifest make sure to first edit all control plane machine configurations and then run talosctl upgrade-k8s as it will take care of updating inline manifests.

Method 5: Using a job

We can utilize a job pattern run arbitrary logic during bootstrap time. We can leverage this to our advantage to install Cilium by using an inline manifest as shown in the example below:

 inlineManifests:
    - name: cilium-install
      contents: |
        ---
        apiVersion: rbac.authorization.k8s.io/v1
        kind: ClusterRoleBinding
        metadata:
          name: cilium-install
        roleRef:
          apiGroup: rbac.authorization.k8s.io
          kind: ClusterRole
          name: cluster-admin
        subjects:
        - kind: ServiceAccount
          name: cilium-install
          namespace: kube-system
        ---
        apiVersion: v1
        kind: ServiceAccount
        metadata:
          name: cilium-install
          namespace: kube-system
        ---
        apiVersion: batch/v1
        kind: Job
        metadata:
          name: cilium-install
          namespace: kube-system
        spec:
          backoffLimit: 10
          template:
            metadata:
              labels:
                app: cilium-install
            spec:
              restartPolicy: OnFailure
              tolerations:
                - operator: Exists
                - effect: NoSchedule
                  operator: Exists
                - effect: NoExecute
                  operator: Exists
                - effect: PreferNoSchedule
                  operator: Exists
                - key: node-role.kubernetes.io/control-plane
                  operator: Exists
                  effect: NoSchedule
                - key: node-role.kubernetes.io/control-plane
                  operator: Exists
                  effect: NoExecute
                - key: node-role.kubernetes.io/control-plane
                  operator: Exists
                  effect: PreferNoSchedule
              affinity:
                nodeAffinity:
                  requiredDuringSchedulingIgnoredDuringExecution:
                    nodeSelectorTerms:
                      - matchExpressions:
                          - key: node-role.kubernetes.io/control-plane
                            operator: Exists
              serviceAccount: cilium-install
              serviceAccountName: cilium-install
              hostNetwork: true
              containers:
              - name: cilium-install
                image: quay.io/cilium/cilium-cli-ci:latest
                env:
                - name: KUBERNETES_SERVICE_HOST
                  valueFrom:
                    fieldRef:
                      apiVersion: v1
                      fieldPath: status.podIP
                - name: KUBERNETES_SERVICE_PORT
                  value: "6443"
                command:
                  - cilium
                  - install
                  - --helm-set=ipam.mode=kubernetes
                  - --set
                  - kubeProxyReplacement=true
                  - --helm-set=securityContext.capabilities.ciliumAgent={CHOWN,KILL,NET_ADMIN,NET_RAW,IPC_LOCK,SYS_ADMIN,SYS_RESOURCE,DAC_OVERRIDE,FOWNER,SETGID,SETUID}
                  - --helm-set=securityContext.capabilities.cleanCiliumState={NET_ADMIN,SYS_ADMIN,SYS_RESOURCE} - --helm-set=cgroup.autoMount.enabled=false
                  - --helm-set=cgroup.hostRoot=/sys/fs/cgroup
                  - --helm-set=k8sServiceHost=localhost
                  - --helm-set=k8sServicePort=7445        

Because there is no CNI present at installation time the kubernetes.default.svc cannot be used to install Cilium, to overcome this limitation we’ll utilize the host network connection to connect back to itself with ‘hostNetwork: true’ in tandem with the environment variables KUBERNETES_SERVICE_PORT and KUBERNETES_SERVICE_HOST.

The job runs a container to install cilium to your liking, after the job is finished Cilium can be managed/operated like usual.

The above can be combined exchanged with for example Method 3 to host arbitrary configurations externally but render/run them at bootstrap time.

Known issues

Other things to know

  • After installing Cilium, cilium connectivity test might hang and/or fail with errors similar to

    Error creating: pods "client-69748f45d8-9b9jg" is forbidden: violates PodSecurity "baseline:latest": non-default capabilities (container "client" must not include "NET_RAW" in securityContext.capabilities.add)

    This is expected, you can workaround it by adding the pod-security.kubernetes.io/enforce=privileged label on the namespace level.

  • Talos has full kernel module support for eBPF, See:

2 - Multus CNI

A brief instruction on howto use Multus on Talos Linux

Multus CNI is a container network interface (CNI) plugin for Kubernetes that enables attaching multiple network interfaces to pods. Typically, in Kubernetes each pod only has one network interface (apart from a loopback) – with Multus you can create a multi-homed pod that has multiple interfaces. This is accomplished by Multus acting as a “meta-plugin”, a CNI plugin that can call multiple other CNI plugins.

Installation

Multus can be deployed by simply applying the thick DaemonSet with kubectl.

kubectl apply -f https://raw.githubusercontent.com/k8snetworkplumbingwg/multus-cni/master/deployments/multus-daemonset-thick.yml

This will create a DaemonSet and a CRD: NetworkAttachmentDefinition. This can be used to specify your network configuration.

Configuration

Patching the DaemonSet

For Multus to properly work with Talos a change need to be made to the DaemonSet. Instead of of mounting the volume called host-run-netns on /run/netns it has to be mounted on /var/run/netns.

Edit the DaemonSet and change the volume host-run-netns from /run/netns to /var/run/netns.

...
        - name: host-run-netns
          hostPath:
            path: /var/run/netns/

Failing to do so will leave your cluster crippled. Running pods will remain running but new pods and deployments will give you the following error in the events:

  Normal   Scheduled               3s    default-scheduler  Successfully assigned virtualmachines/samplepod to virt2
  Warning  FailedCreatePodSandBox  3s    kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "3a6a58386dfbf2471a6f86bd41e4e9a32aac54ccccd1943742cb67d1e9c58b5b": plugin type="multus-shim" name="multus-cni-network" failed (add): CmdAdd (shim): CNI request failed with status 400: 'ContainerID:"3a6a58386dfbf2471a6f86bd41e4e9a32aac54ccccd1943742cb67d1e9c58b5b" Netns:"/var/run/netns/cni-1d80f6e3-fdab-4505-eb83-7deb17431293" IfName:"eth0" Args:"IgnoreUnknown=1;K8S_POD_NAMESPACE=virtualmachines;K8S_POD_NAME=samplepod;K8S_POD_INFRA_CONTAINER_ID=3a6a58386dfbf2471a6f86bd41e4e9a32aac54ccccd1943742cb67d1e9c58b5b;K8S_POD_UID=8304765e-fd7e-4968-9144-c42c53be04f4" Path:"" ERRORED: error configuring pod [virtualmachines/samplepod] networking: [virtualmachines/samplepod/8304765e-fd7e-4968-9144-c42c53be04f4:cbr0]: error adding container to network "cbr0": DelegateAdd: cannot set "" interface name to "eth0": validateIfName: no net namespace /var/run/netns/cni-1d80f6e3-fdab-4505-eb83-7deb17431293 found: failed to Statfs "/var/run/netns/cni-1d80f6e3-fdab-4505-eb83-7deb17431293": no such file or directory
': StdinData: {"capabilities":{"portMappings":true},"clusterNetwork":"/host/etc/cni/net.d/10-flannel.conflist","cniVersion":"0.3.1","logLevel":"verbose","logToStderr":true,"name":"multus-cni-network","type":"multus-shim"}

Creating your NetworkAttachmentDefinition

The NetworkAttachmentDefinition configuration is used to define your bridge where your second pod interface needs to be attached to.

apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: macvlan-conf
spec:
  config: '{
      "cniVersion": "0.3.0",
      "type": "macvlan",
      "master": "eth0",
      "mode": "bridge",
      "ipam": {
        "type": "host-local",
        "subnet": "192.168.1.0/24",
        "rangeStart": "192.168.1.200",
        "rangeEnd": "192.168.1.216",
        "routes": [
          { "dst": "0.0.0.0/0" }
        ],
        "gateway": "192.168.1.1"
      }
    }'

In this example macvlan is used as a bridge type. There are 3 types of bridges: bridge, macvlan and ipvlan:

  1. bridge is a way to connect two Ethernet segments together in a protocol-independent way. Packets are forwarded based on Ethernet address, rather than IP address (like a router). Since forwarding is done at Layer 2, all protocols can go transparently through a bridge. In terms of containers or virtual machines, a bridge can also be used to connect the virtual interfaces of each container/VM to the host network, allowing them to communicate.

  2. macvlan is a driver that makes it possible to create virtual network interfaces that appear as distinct physical devices each with unique MAC addresses. The underlying interface can route traffic to each of these virtual interfaces separately, as if they were separate physical devices. This means that each macvlan interface can have its own IP subnet and routing. Macvlan interfaces are ideal for situations where containers or virtual machines require the same network access as the host system.

  3. ipvlan is similar to macvlan, with the key difference being that ipvlan shares the parent’s MAC address, which requires less configuration from the networking equipment. This makes deployments simpler in certain situations where MAC address control or limits are in place. It offers two operational modes: L2 mode (the default) where it behaves similarly to a MACVLAN, and L3 mode for routing based traffic isolation (rather than bridged).

When using the bridge interface you must also configure a bridge on your Talos nodes. That can be done by updating Talos Linux machine configuration:

machine:
      interfaces:
      - interface: br0
        addresses:
          - 172.16.1.60/24
        bridge:
          stp:
            enabled: true
          interfaces:
              - eno1 # This must be changed to your matching interface name
        routes:
            - network: 0.0.0.0/0 # The route's network (destination).
              gateway: 172.16.1.254 # The route's gateway (if empty, creates link scope route).
              metric: 1024 # The optional metric for the route.

More information about the configuration of bridges can be found here

Attaching the NetworkAttachmentDefinition to your Pod or Deployment

After the NetworkAttachmentDefinition is configured, you can attach that interface to your your Deployment or Pod. In this example we use a pod:

apiVersion: v1
kind: Pod
metadata:
  name: samplepod
  annotations:
    k8s.v1.cni.cncf.io/networks: macvlan-conf
spec:
  containers:
  - name: samplepod
    command: ["/bin/ash", "-c", "trap : TERM INT; sleep infinity & wait"]
    image: alpine

Notes on using KubeVirt in combination with Multus

If you would like to use KubeVirt and expose your virtual machine to the outside world with Multus, make sure to configure a bridge instead of macvlan or ipvlan, because that doesn’t work, according to the KubeVirt Documentation.

Invalid CNIs for secondary networks The following list of CNIs is known not to work for bridge interfaces - which are most common for secondary interfaces.

  • macvlan
  • ipvlan

The reason is similar: the bridge interface type moves the pod interface MAC address to the VM, leaving the pod interface with a different address. The aforementioned CNIs require the pod interface to have the original MAC address.