Production Clusters

Recommendations for setting up a Talos Linux cluster in production.

This guide explains things to consider to create a production quality Talos Linux cluster for bare metal. Check out the Reference Architecture documentation for architectural diagrams and guidance on creating production-grade clusters in other environments.

This guide assumes that you’ve already created a development cluster and are familiar with the Getting Started documentation. If not, please refer to the Getting Started guide for more information.

When moving from a learning environment to a production-ready Talos Linux cluster, you have to consider several critical factors:

High availability for your control plane nodes.
Secure configuration management.
Reliability for continuous service and minimal downtime.
Authentication for access control.

Follow the steps below to build a production-grade Talos cluster that is highly available, reliable, and secure.

Note: Check out Omni for managing large-scale Talos Linux clusters automatically.

Step 1: Prepare Your Infrastructure

To create your production cluster infrastructure:

Boot your machines using the Talos ISO image
Ensure network access on your nodes.

Here is how to do each step:

Boot Your Machines Using the Talos ISO Image

Follow these steps to boot your machines using the Talos ISO image:

Download the latest ISO for your infrastructure depending on the hardware type from the Talos Image factory.
Note: For network booting and self-built media using published kernel there are a number of required kernel parameters. Please see the kernel docs getting-started for more information.
Boot three control planes using the ISO image you just downloaded.
Boot additional machines as worker nodes.

Ensure Network Access

If your nodes are behind a firewall, in a private network, or otherwise not directly reachable, you would need to configure a load balancer to forward TCP port 50000 to reach the nodes for Talos API access.

Note: Because the Talos Linux API uses gRPC and mutual TLS, it cannot be proxied by a HTTP/S proxy, but only by a TCP load balancer.

With your control plane and worker nodes booted, next configure your Kubernetes endpoint.

Step 2: Store Your IP Addresses in a Variable

To store variables for your machines’ IP addresses:

Copy the IP address displayed on each machine console, including the control plane and any worker nodes you’ve created.
If you don’t have a display connected, retrieve the IP addresses from your DHCP server.
Create a Bash array for your control plane node IP addresses, replacing each <control-plane-ip> placeholder with the IP address of a control plane node. You can include as many IP addresses as needed:
bash
```
CONTROL_PLANE_IP=("<control-plane-ip-1>" "<control-plane-ip-2>" "<control-plane-ip-3>")
```
For example:
If your control plane nodes IP addresses are 192.168.0.2, 192.168.0.3, 192.168.0.4, your command would be:
bash
```
CONTROL_PLANE_IP= ("192.168.0.2" "192.168.0.3" "192.168.0.4")
```
If you have worker nodes, store their IP addresses in a Bash array. Replace each <worker-ip> placeholder with the actual IP address of a worker node. You can include as many IP addresses as needed:
bash
```
WORKER_IP=("<worker-ip-1>" "<worker-ip-2>" "<worker-ip-3>")
```

Step 3: Decide Your Kubernetes Endpoint

You’ve set up multiple control planes for high availability, but they only provide true high availability if the Kubernetes API server endpoint can reach all control plane nodes.

Here are two common ways to configure this:

Dedicated load balancer: Set a dedicated load balancer that route to your control plane nodes.
DNS records: Create multiple DNS records that point to all your control plane nodes

With these, you can pass in one IP address or DNS name during setup that route to all your control plane nodes.

Here is how you can configure each option:

Dedicated Load Balancer

If you’re using a cloud provider or have your own load balancer (such as HAProxy, an NGINX reverse proxy, or an F5 load balancer), setting up a dedicated load balancer is a natural choice.

It is also important to note that if you created the cluster with Omni, Omni will automatically be a load balancer for your Kubernetes endpoint.

Configure a frontend to listen on TCP port 6443 and direct traffic to the addresses of your Talos control plane nodes.

Your Kubernetes endpoint will be the IP address or DNS name of the load balancer’s frontend, with the port appended, for example, https://myK8s.mydomain.io:6443.

Note: You cannot use a HTTP load balancer, because the Kubernetes API server handles TLS termination and mutual TLS authentication.

DNS Records

Additionally, you can configure your Kubernetes endpoint using DNS records. Simply, add multiple A or AAAA records, one for each control plane, to a DNS name.

For example, you can add:

url

kube.cluster1.mydomain.com  IN  A  192.168.0.10
kube.cluster1.mydomain.com  IN  A  192.168.0.11
kube.cluster1.mydomain.com  IN  A  192.168.0.12

Then your endpoint would be:

url

https://kube.cluster1.mydomain.com:6443

Step 4: Save Your Endpoint in a Variable

Set a variable to store the endpoint you chose in Step 3. Replace <your_endpoint> placeholder with your actual endpoint:

bash

export YOUR_ENDPOINT=<your_endpoint>

Step 5: Generate Secrets Bundle

The secrets bundle is a file that contains all the cryptographic keys, certificates, and tokens needed to secure your Talos Linux cluster.

To generate the secrets bundle, run:

bash

talosctl gen secrets -o secrets.yaml

Step 6: Generate Machine Configurations

Follow these steps to generate machine configuration:

Set a variable for your cluster name by running the following command. Replace <your_cluster_name> with the name you want to give your cluster:
bash
```
export CLUSTER_NAME=<your_cluster_name>
```
Run this command to generate your machine configuration files using your secrets bundle:
bash
```
talosctl gen config --with-secrets secrets.yaml $CLUSTER_NAME https://$YOUR_ENDPOINT:6443
```

This command will generate three files:

controlplane.yaml: Configuration for your control plane.
worker.yaml: Configuration for your worker nodes.
talosconfig: The talosctl configuration file used to connect to and authenticate with your cluster.

Step 7: Unmount the ISO

Unplug your installation USB drive or unmount the ISO from all your control plane and worker nodes. This prevents you from accidentally installing to the USB drive and makes it clearer which disk to select for installation.

Step 8: Understand Your Nodes

The default machine configurations for control plane and worker nodes are typically sufficient to get your cluster running. However, you may need to customize certain settings such as network interfaces and disk configurations depending on your specific environment.

Follow these steps to verify that your machine configurations are set up correctly:

Check network interfaces: Run this command to view all network interfaces on any node, whether control plane or worker.
Replace <node-ip-address> with the IP of the node you want to inspect.
Note: Copy the network ID with an Operational state (OPER) value of up.
bash
```
talosctl --nodes <node-ip-address> get links --insecure
```
Check Available Disks: Run this command to check all available disks on any node. Replace <node-ip-address> with the IP address of the node you want to inspect:
bash
```
talosctl get disks --insecure --nodes <node-ip-address>
```
Verify Configuration Files: Open your worker.yaml and controlplane.yaml configuration files in your preferred editor. Check that the values match your worker and control plane node’s network and disk settings. If the values don’t match, you’ll need to update your machine configuration..
Note: Refer to the Talos CLI reference for additional commands to gather more information about your nodes and cluster.

Step 9: Patch Your Machine Configuration (Optional)

You can patch your worker and control plane machine configuration to reflect the correct network interface and disk of your control plane nodes.

Follow these steps to patch your machine configuration:

Create patch files for the configurations you want to modify:
bash
```
touch controlplane-patch-1.yaml # For patching the control plane nodes configuration
touch worker-patch-1.yaml # For patching the worker nodes configuration
```
Note: You don’t have to create both patch files, only create patches for the configurations you actually need to modify.
You can also create multiple patch files (e.g., controlplane-patch-2.yaml, controlplane-patch-3.yaml) if you want to make multiple subsequent patches to the same machine configuration.

Copy and paste this YAML block of code and add the correct hardware values to each patch file.

For example, for controlplane-patch-1 use the network interface and disk information you gathered from your control plane nodes :

yaml

# controlplane-patch-1 file
machine:
  network:
    interfaces:
      - interface: <control-plane-network-interface>  # From control plane node
        dhcp: true
  install:
    disk: /dev/<control-plane-disk-name> # From control plane node

For worker-patch-1.yaml, use network interface and disk information from your worker nodes:

yaml

# worker-patch-1.yaml file

machine:
  network:
    interfaces:
      - interface: <worker-network-interface>  # From worker node
        dhcp: true
  install:
    disk: /dev/<worker-disk-name> # From worker node

Apply the different patch files for the different machine configuration:

For control plane:

bash

talosctl machineconfig patch controlplane.yaml --patch @controlplane-patch-1.yaml --output controlplane.yaml

For worker:

bash

talosctl machineconfig patch worker.yaml --patch @worker-patch-1.yaml --output worker.yaml

Additionally, you can learn more about patches from the configuration patches documentation.

Step 10: Configure Your Multihomed Machines

If your machines are multihomed, i.e., they have more than one IPv4 and/or IPv6 addresses other than loopback, then additional configuration is required. Refer to Multihoming for more information.

Step 11: Apply the Machine Configuration

To apply your machine configuration:

Run this command to apply the controlplane.yaml configuration to your control plane nodes:

bash

for ip in "${CONTROL_PLANE_IP[@]}"; do
  echo "=== Applying configuration to node $ip ==="
  talosctl apply-config --insecure \
    --nodes $ip \
    --file controlplane.yaml
  echo "Configuration applied to $ip"
  echo ""
done

Run this command to apply the worker.yamlconfiguration to your worker node:

bash

for ip in "${WORKER_IP[@]}"; do
  echo "=== Applying configuration to node $ip ==="
  talosctl apply-config --insecure \
    --nodes $ip \
    --file worker.yaml
  echo "Configuration applied to $ip"
  echo ""
done

Step 12: Manage Your Talos Configuration File

The talosconfig is your key to managing the Talos Linux cluster, without it, you cannot authenticate or communicate with your cluster nodes using talosctl.

You have two options for managing your talosconfig:

Merge your new talosconfig into the default configuration file located at ~/.talos/config:
bash
```
talosctl config merge ./talosconfig
```
Copy the configuration file to your ~/.talos directory and set the TALOSCONFIG environment variable:
bash
```
mkdir -p ~/.talos
cp ./talosconfig ~/.talos/config
export TALOSCONFIG=~/.talos/config
```

Step 13: Set Endpoints of Your Control Plane Nodes

Configure your endpoints to enable talosctl to automatically load balance requests and fail over between control plane nodes when individual nodes become unavailable.

Run this command to configure your endpoints. Replace the placeholders <control_plane_IP_1> <control_plane_IP_2> <control_plane_IP_3> with the IP addresses of your control plane nodes:

bash

talosctl config endpoint <control_plane_IP_1> <control_plane_IP_2> <control_plane_IP_3>

For example:

If your control plane nodes IP addresses are 192.168.0.2, 192.168.0.3, 192.168.0.4, your command would be:

bash

talosctl config endpoint 192.168.0.2 192.168.0.3 192.168.0.4

Step 14: Bootstrap Your Kubernetes Cluster

Wait for your control plane nodes to finish booting, then bootstrap your etcd cluster by running the command below.

Replace the <control-plane-IP> placeholder with the IP address of ONE of your three control plane nodes:

bash

talosctl bootstrap --nodes <control-plane-IP>

Note: Run this command ONCE on a SINGLE control plane node. If you have multiple control plane nodes, you can choose any of them.

Step 15: Get Kubernetes Access

Download your kubeconfig file to start using kubectl with your cluster. These commands must be run against a single control plane node.

You have two options for managing your kubeconfig. Replace <control-plane-IP> with the IP address of any one of your control plane nodes:

Merge into your default kubeconfig:

bash

talosctl kubeconfig --nodes <control-plane-IP>

Create a separate kubeconfig file:

bash

talosctl kubeconfig alternative-kubeconfig --nodes <control-plane-IP>
export KUBECONFIG=./alternative-kubeconfig

Step 16: Verify Your Nodes Are Running

Run the command to ensure that your nodes are running:

bash

kubectl get nodes

Next Steps

Congratulations! You now have a working production grade Talos Linux Kubernetes cluster.

What’s Next?

Last modified September 3, 2025: docs: add docs on multihoming (421634a14)