This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Network

Managing the Kubernetes cluster networking

1 - Deploying Cilium CNI

In this guide you will learn how to set up Cilium CNI on Talos.

From v1.9 onwards Cilium does no longer provide a one-liner install manifest that can be used to install Cilium on a node via kubectl apply -f or passing it in as an extra url in the urls part in the Talos machine configuration.

Installing Cilium the new way via the cilium cli is broken, so we’ll be using helm to install Cilium. For more information: Install with CLI fails, works with Helm

Refer to Installing with Helm for more information.

First we’ll need to add the helm repo for Cilium.

helm repo add cilium https://helm.cilium.io/
helm repo update

This documentation will outline installing Cilium CNI v1.11.2 on Talos in four different ways. Adhering to Talos principles we’ll deploy Cilium with IPAM mode set to Kubernetes. Each method can either install Cilium using kube proxy (default) or without: Kubernetes Without kube-proxy

Machine config preparation

When generating the machine config for a node set the CNI to none. For example using a config patch:

talosctl gen config \
    my-cluster https://mycluster.local:6443 \
    --config-patch '[{"op":"add", "path": "/cluster/network/cni", "value": {"name": "none"}}]'

Or if you want to deploy Cilium in strict mode without kube-proxy, you also need to disable kube proxy:

talosctl gen config \
    my-cluster https://mycluster.local:6443 \
    --config-patch '[{"op": "add", "path": "/cluster/proxy", "value": {"disabled": true}}, {"op":"add", "path": "/cluster/network/cni", "value": {"name": "none"}}]'

Method 1: Helm install

After applying the machine config and bootstrapping Talos will appear to hang on phase 18/19 with the message: retrying error: node not ready. This happens because nodes in Kubernetes are only marked as ready once the CNI is up. As there is no CNI defined, the boot process is pending and will reboot the node to retry after 10 minutes, this is expected behavior.

During this window you can install Cilium manually by running the following:

helm install cilium cilium/cilium \
    --version 1.11.2 \
    --namespace kube-system \
    --set ipam.mode=kubernetes

Or if you want to deploy Cilium in strict mode without kube-proxy, also set some extra paramaters:

export KUBERNETES_API_SERVER_ADDRESS=<>
export KUBERNETES_API_SERVER_PORT=6443

helm install cilium cilium/cilium \
    --version 1.11.2 \
    --namespace kube-system \
    --set ipam.mode=kubernetes \
    --set kubeProxyReplacement=strict \
    --set k8sServiceHost="${KUBERNETES_API_SERVER_ADDRESS}" \
    --set k8sServicePort="${KUBERNETES_API_SERVER_PORT}"

After Cilium is installed the boot process should continue and complete successfully.

Method 2: Helm manifests install

Instead of directly installing Cilium you can instead first generate the manifest and then apply it:

helm template cilium cilium/cilium \
    --version 1.11.2 \
    --namespace kube-system \
    --set ipam.mode=kubernetes > cilium.yaml

kubectl apply -f cilium.yaml

Without kube-proxy:

export KUBERNETES_API_SERVER_ADDRESS=<>
export KUBERNETES_API_SERVER_PORT=6443

helm template cilium cilium/cilium \
    --version 1.11.2 \
    --namespace kube-system \
    --set ipam.mode=kubernetes \
    --set kubeProxyReplacement=strict \
    --set k8sServiceHost="${KUBERNETES_API_SERVER_ADDRESS}" \
    --set k8sServicePort="${KUBERNETES_API_SERVER_PORT}" > cilium.yaml

kubectl apply -f cilium.yaml

Method 3: Helm manifests hosted install

After generating cilium.yaml using helm template, instead of applying this manifest directly during the Talos boot window (before the reboot timeout). You can also host this file somewhere and patch the machine config to apply this manifest automatically during bootstrap. To do this patch your machine configuration to include this config instead of the above:

talosctl gen config \
    my-cluster https://mycluster.local:6443 \
    --config-patch '[{"op":"add", "path": "/cluster/network/cni", "value": {"name": "custom", "urls": ["https://server.yourdomain.tld/some/path/cilium.yaml"]}}]'

Resulting in a config that look like this:

name: custom # Name of CNI to use.
# URLs containing manifests to apply for the CNI.
urls:
    - https://server.yourdomain.tld/some/path/cilium.yaml

However, beware of the fact that the helm generated Cilium manifest contains sensitive key material. As such you should definitely not host this somewhere publicly accessible.

Method 4: Helm manifests inline install

A more secure option would be to include the helm template output manifest inside the machine configuration. The machine config should be generated with CNI set to none

talosctl gen config \
    my-cluster https://mycluster.local:6443 \
    --config-patch '[{"op":"add", "path": "/cluster/network/cni", "value": {"name": "none"}}]'

if deploying Cilium with kube-proxy disabled, you can also include the following:

talosctl gen config \
    my-cluster https://mycluster.local:6443 \
    --config-patch '[{"op": "add", "path": "/cluster/proxy", "value": {"disabled": true}}, {"op":"add", "path": "/cluster/network/cni", "value": {"name": "none"}}]'

To do so patch this into your machine configuration:

cluster:
  inlineManifests:
    - name: cilium
      contents: |
        --
        # Source: cilium/templates/cilium-agent/serviceaccount.yaml
        apiVersion: v1
        kind: ServiceAccount
        metadata:
          name: "cilium"
          namespace: kube-system
        ---
        # Source: cilium/templates/cilium-operator/serviceaccount.yaml
        apiVersion: v1
        kind: ServiceAccount
        -> Your cilium.yaml file will be pretty long....        

This will install the Cilium manifests at just the right time during bootstrap.

Beware though:

  • Changing the namespace when templating with Helm does not generate a manifest containing the yaml to create that namespace. As the inline manifest is processed from top to bottom make sure to manually put the namespace yaml at the start of the inline manifest.
  • Only add the Cilium inline manifest to the control plane nodes machine configuration.
  • Make sure all control plane nodes have an identical configuration.
  • If you delete any of the generated resources they will be restored whenever a control plane node reboots.
  • As a safety messure Talos only creates missing resources from inline manifests, it never deletes or updates anything.
  • If you need to update a manifest make sure to first edit all control plane machine configurations and then run talosctl upgrade-k8s as it will take care of updating inline manifests.

Known issues

Other things to know

  • Talos has full kernel module support for eBPF, See:

  • Talos also includes the modules:

    • CONFIG_NETFILTER_XT_TARGET_TPROXY=m
    • CONFIG_NETFILTER_XT_TARGET_CT=m
    • CONFIG_NETFILTER_XT_MATCH_MARK=m
    • CONFIG_NETFILTER_XT_MATCH_SOCKET=m

    This allows you to set --set enableXTSocketFallback=false on the helm install/template command preventing Cilium from disabling the ip_early_demux kernel feature. This will win back some performance.

2 - KubeSpan

Learn to use KubeSpan to connect Talos Linux machines securely across networks.

KubeSpan is a feature of Talos that automates the setup and maintenance of a full mesh WireGuard network for your cluster, giving you the ability to operate hybrid Kubernetes clusters that can span the edge, datacenter, and cloud. Management of keys and discovery of peers can be completely automated for a zero-touch experience that makes it simple and easy to create hybrid clusters.

KubeSpan consists of client code in Talos Linux, as well as a discovery service that enables clients to securely find each other. Sidero Labs operates a free Discovery Service, but the discovery service may be operated by your organization and can be downloaded here.

Video Walkthrough

To learn more about KubeSpan, see the video below:

To see a live demo of KubeSpan, see one the videos below:

Enabling

Creating a New Cluster

To generate configuration files for a new cluster, we can use the --with-kubespan flag in talosctl gen config. This will enable peer discovery and KubeSpan.

machine:
    network:
        kubespan:
            enabled: true # Enable the KubeSpan feature.
cluster:
    discovery:
        enabled: true
        # Configure registries used for cluster member discovery.
        registries:
            kubernetes: # Kubernetes registry is problematic with KubeSpan, if the control plane endpoint is routeable itself via KubeSpan.
              disabled: true
            service: {}

The default discovery service is an external service hosted for free by Sidero Labs. The default value is https://discovery.talos.dev/. Contact Sidero Labs if you need to run this service privately.

Enabling for an Existing Cluster

In order to enable KubeSpan for an existing cluster, enable kubespan and discovery settings in the machine config for each machine in the cluster (discovery is enabled by default):

machine:
  network:
    kubespan:
      enabled: true
cluster:
  discovery:
    enabled: true

Configuration

KubeSpan will automatically discovery all cluster members, exchange Wireguard public keys and establish a full mesh network.

There are a few configuration options available to fine-tune the feature:

machine:
  network:
    kubespan:
      enabled: false
      advertiseKubernetesNetworks: false
      allowDownPeerBypass: false
      mtu: 1420
      filters:
        endpoints:
          - 0.0.0.0/0
          - ::/0

The setting advertiseKubernetesNetworks controls whether the node will advertise Kubernetes service and pod networks to other nodes in the cluster over KubeSpan. It defaults to being disabled, which means KubeSpan only controls the node-to-node traffic, while pod-to-pod traffic is routed and encapsulated by CNI. This setting should not be enabled with Calico and Cilium CNI plugins, as they do their own pod IP allocation which is not visible to KubeSpan.

The setting allowDownPeerBypass controls whether the node will allow traffic to bypass WireGuard if the destination is not connected over KubeSpan. If enabled, there is a risk that traffic will be routed unencrypted if the destination is not connected over KubeSpan, but it allows a workaround for the case where a node is not connected to the KubeSpan network, but still needs to access the cluster.

The mtu setting configures the Wireguard MTU, which defaults to 1420. This default value of 1420 is safe to use when the underlying network MTU is 1500, but if the underlying network MTU is smaller, the KubeSpanMTU should be adjusted accordingly: KubeSpanMTU = UnderlyingMTU - 80.

The filters setting allows to hide some endpoints from being advertised over KubeSpan. This is useful when some endpoints are known to be unreachable between the nodes, so that KubeSpan doesn’t try to establish a connection to them. Another use-case is hiding some endpoints if nodes can connect on multiple networks, and some of the networks are more preferable than others.

Resource Definitions

KubeSpanIdentities

A node’s WireGuard identities can be obtained with:

$ talosctl get kubespanidentities -o yaml
...
spec:
    address: fd83:b1f7:fcb5:2802:8c13:71ff:feaf:7c94/128
    subnet: fd83:b1f7:fcb5:2802::/64
    privateKey: gNoasoKOJzl+/B+uXhvsBVxv81OcVLrlcmQ5jQwZO08=
    publicKey: NzW8oeIH5rJyY5lefD9WRoHWWRr/Q6DwsDjMX+xKjT4=

Talos automatically configures unique IPv6 address for each node in the cluster-specific IPv6 ULA prefix.

Wireguard private key is generated for the node, private key never leaves the node while public key is published through the cluster discovery.

KubeSpanIdentity is persisted across reboots and upgrades in STATE partition in the file kubespan-identity.yaml.

KubeSpanPeerSpecs

A node’s WireGuard peers can be obtained with:

$ talosctl get kubespanpeerspecs
ID                                             VERSION   LABEL                          ENDPOINTS
06D9QQOydzKrOL7oeLiqHy9OWE8KtmJzZII2A5/FLFI=   2         talos-default-controlplane-2   ["172.20.0.3:51820"]
THtfKtfNnzJs1nMQKs5IXqK0DFXmM//0WMY+NnaZrhU=   2         talos-default-controlplane-3   ["172.20.0.4:51820"]
nVHu7l13uZyk0AaI1WuzL2/48iG8af4WRv+LWmAax1M=   2         talos-default-worker-2         ["172.20.0.6:51820"]
zXP0QeqRo+CBgDH1uOBiQ8tA+AKEQP9hWkqmkE/oDlc=   2         talos-default-worker-1         ["172.20.0.5:51820"]

The peer ID is the Wireguard public key. KubeSpanPeerSpecs are built from the cluster discovery data.

KubeSpanPeerStatuses

The status of a node’s WireGuard peers can be obtained with:

$ talosctl get kubespanpeerstatuses
ID                                             VERSION   LABEL                          ENDPOINT           STATE   RX         TX
06D9QQOydzKrOL7oeLiqHy9OWE8KtmJzZII2A5/FLFI=   63        talos-default-controlplane-2   172.20.0.3:51820   up      15043220   17869488
THtfKtfNnzJs1nMQKs5IXqK0DFXmM//0WMY+NnaZrhU=   62        talos-default-controlplane-3   172.20.0.4:51820   up      14573208   18157680
nVHu7l13uZyk0AaI1WuzL2/48iG8af4WRv+LWmAax1M=   60        talos-default-worker-2         172.20.0.6:51820   up      130072     46888
zXP0QeqRo+CBgDH1uOBiQ8tA+AKEQP9hWkqmkE/oDlc=   60        talos-default-worker-1         172.20.0.5:51820   up      130044     46556

KubeSpan peer status includes following information:

  • the actual endpoint used for peer communication
  • link state:
    • unknown: the endpoint was just changed, link state is not known yet
    • up: there is a recent handshake from the peer
    • down: there is no handshake from the peer
  • number of bytes sent/received over the Wireguard link with the peer

If the connection state goes down, Talos will be cycling through the available endpoints until it finds the one which works.

Peer status information is updated every 30 seconds.

KubeSpanEndpoints

A node’s WireGuard endpoints (peer addresses) can be obtained with:

$ talosctl get kubespanendpoints
ID                                             VERSION   ENDPOINT           AFFILIATE ID
06D9QQOydzKrOL7oeLiqHy9OWE8KtmJzZII2A5/FLFI=   1         172.20.0.3:51820   2VfX3nu67ZtZPl57IdJrU87BMjVWkSBJiL9ulP9TCnF
THtfKtfNnzJs1nMQKs5IXqK0DFXmM//0WMY+NnaZrhU=   1         172.20.0.4:51820   b3DebkPaCRLTLLWaeRF1ejGaR0lK3m79jRJcPn0mfA6C
nVHu7l13uZyk0AaI1WuzL2/48iG8af4WRv+LWmAax1M=   1         172.20.0.6:51820   NVtfu1bT1QjhNq5xJFUZl8f8I8LOCnnpGrZfPpdN9WlB
zXP0QeqRo+CBgDH1uOBiQ8tA+AKEQP9hWkqmkE/oDlc=   1         172.20.0.5:51820   6EVq8RHIne03LeZiJ60WsJcoQOtttw1ejvTS6SOBzhUA

The endpoint ID is the base64 encoded WireGuard public key.

The observed endpoints are submitted back to the discovery service (if enabled) so that other peers can try additional endpoints to establish the connection.