If you’re interested in this project and would like to help in engineering efforts, or have general usage questions, we are happy to have you!
We hold a weekly meeting that all audiences are welcome to attend.
We would appreciate your feedback so that we can make Talos even better!
To do so, you can take our survey.
You can subscribe to this meeting by joining the community forum above.
Enterprise
If you are using Talos in a production setting, and need consulting services to get started or to integrate Talos into your existing environment, we can help.
Sidero Labs, Inc. offers support contracts with SLA (Service Level Agreement)-bound terms for mission-critical environments.
Talos is an open source platform to host and maintain Kubernetes clusters.
It includes a purpose-built operating system and associated management tools.
It can run on all major cloud providers, virtualization platforms, and bare metal hardware.
All system management is done via an API, and there is no shell or interactive console.
Some of the capabilities and benefits provided by Talos include:
Security: Talos reduces your attack surface by practicing the Principle of Least Privilege (PoLP) and by securing the API with mutual TLS (mTLS) authentication.
Predictability: Talos eliminates unneeded variables and reduces unknown factors in your environment by employing immutable infrastructure ideology.
Evolvability: Talos simplifies your architecture and increases your ability to easily accommodate future changes.
Talos is flexible and can be deployed in a variety of ways, but the easiest way to get started and experiment with the system is to run a local cluster on your laptop or workstation.
There are two options:
In this guide we will create a Kubernetes cluster in Docker, using a containerized version of Talos.
Running Talos in Docker is intended to be used in CI pipelines, and local testing when you need a quick and easy cluster.
Furthermore, if you are running Talos in production, it provides an excellent way for developers to develop against the same version of Talos.
Requirements
The follow are requirements for running Talos in Docker:
Once the cluster is available, you can make use of talosctl and kubectl to interact with the cluster.
For example, to view current running containers, run talosctl containers for a list of containers in the system namespace, or talosctl containers -k for the k8s.io namespace.
To view the logs of a container, use talosctl logs <container> or talosctl logs -k <container>.
Cleaning Up
To cleanup, run:
talosctl cluster destroy
1.3 - System Requirements
Minimum Requirements
Role
Memory
Cores
Init/Control Plane
2GB
2
Worker
1GB
1
Recommended
Role
Memory
Cores
Init/Control Plane
4GB
4
Worker
2GB
2
These requirements are similar to that of kubernetes.
In this guide we will create an Kubernetes cluster with 1 worker node, and 2 controlplane nodes.
We assume an existing digital rebar deployment, and some familiarity with iPXE.
We leave it up to the user to decide if they would like to use static networking, or DHCP.
The setup and configuration of DHCP will not be covered.
Create the Machine Configuration Files
Generating Base Configurations
Using the DNS name of the load balancer, generate the base configuration files for the Talos machines:
$ talosctl gen config talos-k8s-metal-tutorial https://<load balancer IP or DNS>:<port>
created init.yaml
created controlplane.yaml
created join.yaml
created talosconfig
The loadbalancer is used to distribute the load across multiple controlplane nodes.
This isn’t covered in detail, because we asume some loadbalancing knowledge before hand.
If you think this should be added to the docs, please create a issue.
At this point, you can modify the generated configs to your liking.
Validate the Configuration Files
$ talosctl validate --config init.yaml --mode metal
init.yaml is valid for metal mode
$ talosctl validate --config controlplane.yaml --mode metal
controlplane.yaml is valid for metal mode
$ talosctl validate --config join.yaml --mode metal
join.yaml is valid for metal mode
Publishing the Machine Configuration Files
Digital Rebar has a build-in fileserver, which means we can use this feature to expose the talos configuration files.
We will place init.yaml, controlplane.yaml, and worker.yaml into Digital Rebar file server by using the drpcli tools.
Copy the generated files from the step above into your Digital Rebar installation.
drpcli file upload <file>.yaml as <file>.yaml
Replacing <file> with init, controlplane or worker.
Download the boot files
Download a recent version of boot.tar.gz from github.
Talos is known to work on Equinix Metal; however, it is currently undocumented.
2.3 - Matchbox
Creating a Cluster
In this guide we will create an HA Kubernetes cluster with 3 worker nodes.
We assume an existing load balancer, matchbox deployment, and some familiarity with iPXE.
We leave it up to the user to decide if they would like to use static networking, or DHCP.
The setup and configuration of DHCP will not be covered.
Create the Machine Configuration Files
Generating Base Configurations
Using the DNS name of the load balancer, generate the base configuration files for the Talos machines:
$ talosctl gen config talos-k8s-metal-tutorial https://<load balancer IP or DNS>:<port>
created init.yaml
created controlplane.yaml
created join.yaml
created talosconfig
At this point, you can modify the generated configs to your liking.
Validate the Configuration Files
$ talosctl validate --config init.yaml --mode metal
init.yaml is valid for metal mode
$ talosctl validate --config controlplane.yaml --mode metal
controlplane.yaml is valid for metal mode
$ talosctl validate --config join.yaml --mode metal
join.yaml is valid for metal mode
Publishing the Machine Configuration Files
In bare-metal setups it is up to the user to provide the configuration files over HTTP(S).
A special kernel parameter (talos.config) must be used to inform Talos about where it should retreive its’ configuration file.
To keep things simple we will place init.yaml, controlplane.yaml, and join.yaml into Matchbox’s assets directory.
This directory is automatically served by Matchbox.
Create the Matchbox Configuration Files
The profiles we will create will reference vmlinuz, and initramfs.xz.
Download these files from the release of your choice, and place them in /var/lib/matchbox/assets.
Now that we have our configuraton files in place, boot all the machines.
Talos will come up on each machine, grab its’ configuration file, and bootstrap itself.
Retrieve the kubeconfig
At this point we can retrieve the admin kubeconfig by running:
Sidero is a project created by the Talos team that has native support for Talos.
The best way to get started with Sidero is to visit the website.
3 - Virtualized Platforms
3.1 - Hyper-V
Talos is known to work on Hyper-V; however, it is currently undocumented.
3.2 - KVM
Talos is known to work on KVM; however, it is currently undocumented.
3.3 - Proxmox
Talos is known to work on Proxmox; however, it is currently undocumented.
3.4 - VMware
Creating a Cluster via the govc CLI
In this guide we will create an HA Kubernetes cluster with 3 worker nodes.
We will use the govc cli which can be downloaded here.
Prerequisites
Prior to starting, it is important to have the following infrastructure in place and available:
DHCP server
Load Balancer or DNS address for cluster endpoint
If using a load balancer, the most common setup is to balance tcp/443 across the control plane nodes tcp/6443
If using a DNS address, the A record should return back the addresses of the control plane nodes
Create the Machine Configuration Files
Generating Base Configurations
Using the DNS name or name of the loadbalancer used in the prereq steps, generate the base configuration files for the Talos machines:
$ talosctl gen config talos-k8s-vmware-tutorial https://<load balancer IP or DNS>:<port>
created init.yaml
created controlplane.yaml
created join.yaml
created talosconfig
$ talosctl gen config talos-k8s-vmware-tutorial https://<DNS name>:6443
created init.yaml
created controlplane.yaml
created join.yaml
created talosconfig
At this point, you can modify the generated configs to your liking.
Validate the Configuration Files
$ talosctl validate --config init.yaml --mode cloud
init.yaml is valid for cloud mode
$ talosctl validate --config controlplane.yaml --mode cloud
controlplane.yaml is valid for cloud mode
$ talosctl validate --config join.yaml --mode cloud
join.yaml is valid for cloud mode
Set Environment Variables
govc makes use of the following environment variables
A talos.ova asset is published with each release.
We will refer to the version of the release as $TALOS_VERSION below.
It can be easily exported with export TALOS_VERSION="v0.3.0-alpha.10" or similar.
We’ll need to repeat this step for each Talos node we want to create.
In a typical HA setup, we’ll have 3 control plane nodes and N workers.
In the following example, we’ll setup a HA control plane with two worker nodes.
Talos makes use of the guestinfo facility of VMware to provide the machine/cluster configuration.
This can be set using the govc vm.change command.
To facilitate persistent storage using the vSphere cloud provider integration with Kubernetes, disk.enableUUID=1 is used.
Talos is known to work on Xen; however, it is currently undocumented.
4 - Cloud Platforms
4.1 - AWS
Creating a Cluster via the AWS CLI
In this guide we will create an HA Kubernetes cluster with 3 worker nodes.
We assume an existing VPC, and some familiarity with AWS.
If you need more information on AWS specifics, please see the official AWS documentation.
Take note of the DNS name and ARN.
We will need these soon.
Create the Machine Configuration Files
Generating Base Configurations
Using the DNS name of the loadbalancer created earlier, generate the base configuration files for the Talos machines:
$ talosctl gen config talos-k8s-aws-tutorial https://<load balancer IP or DNS>:<port>
created init.yaml
created controlplane.yaml
created join.yaml
created talosconfig
At this point, you can modify the generated configs to your liking.
Validate the Configuration Files
$ talosctl validate --config init.yaml --mode cloud
init.yaml is valid for cloud mode
$ talosctl validate --config controlplane.yaml --mode cloud
controlplane.yaml is valid for cloud mode
$ talosctl validate --config join.yaml --mode cloud
join.yaml is valid for cloud mode
Create the EC2 Instances
Note: There is a known issue that prevents Talos from running on T2 instance types.
Please use T3 if you need burstable instance types.
In this guide we will create an HA Kubernetes cluster with 1 worker node.
We assume existing Blob Storage, and some familiarity with Azure.
If you need more information on Azure specifics, please see the official Azure documentation.
Environment Setup
We’ll make use of the following environment variables throughout the setup.
Edit the variables below with your correct information.
# Storage account to useexportSTORAGE_ACCOUNT="StorageAccountName"# Storage container to upload toexportSTORAGE_CONTAINER="StorageContainerName"# Resource group nameexportGROUP="ResourceGroupName"# LocationexportLOCATION="centralus"# Get storage account connection string based on info aboveexportCONNECTION=$(az storage account show-connection-string \
-n $STORAGE_ACCOUNT\
-g $GROUP\
-o tsv)
Create the Image
First, download the Azure image from a Talos release.
Once downloaded, untar with tar -xvf /path/to/azure.tar.gz
Upload the VHD
Once you have pulled down the image, you can upload it to blob storage with:
Now that the image is present in our blob storage, we’ll register it.
az image create \
--name talos \
--source https://$STORAGE_ACCOUNT.blob.core.windows.net/$STORAGE_CONTAINER/talos-azure.vhd \
--os-type linux \
-g $GROUP
Network Infrastructure
Virtual Networks and Security Groups
Once the image is prepared, we’ll want to work through setting up the network.
Issue the following to create a network security group and add rules to it.
In Azure, we have to pre-create the NICs for our control plane so that they can be associated with our load balancer.
for i in $( seq 012); do# Create public IP for each nic az network public-ip create \
--resource-group $GROUP\
--name talos-controlplane-public-ip-$i\
--allocation-method static
# Create nic az network nic create \
--resource-group $GROUP\
--name talos-controlplane-nic-$i\
--vnet-name talos-vnet \
--subnet talos-subnet \
--network-security-group talos-sg \
--public-ip-address talos-controlplane-public-ip-$i\
--lb-name talos-lb \
--lb-address-pools talos-be-pool
done
Cluster Configuration
With our networking bits setup, we’ll fetch the IP for our load balancer and create our configuration files.
LB_PUBLIC_IP=$(az network public-ip show \
--resource-group $GROUP\
--name talos-public-ip \
--query [ipAddress]\
--output tsv)talosctl gen config talos-k8s-azure-tutorial https://${LB_PUBLIC_IP}:6443
Compute Creation
We are now ready to create our azure nodes.
# Create availability setaz vm availability-set create \
--name talos-controlplane-av-set \
-g $GROUP# Create controlplane 0az vm create \
--name talos-controlplane-0 \
--image talos \
--custom-data ./init.yaml \
-g $GROUP\
--admin-username talos \
--generate-ssh-keys \
--verbose \
--boot-diagnostics-storage $STORAGE_ACCOUNT\
--os-disk-size-gb 20\
--nics talos-controlplane-nic-0 \
--availability-set talos-controlplane-av-set \
--no-wait
# Create 2 more controlplane nodesfor i in $( seq 12); do az vm create \
--name talos-controlplane-$i\
--image talos \
--custom-data ./controlplane.yaml \
-g $GROUP\
--admin-username talos \
--generate-ssh-keys \
--verbose \
--boot-diagnostics-storage $STORAGE_ACCOUNT\
--os-disk-size-gb 20\
--nics talos-controlplane-nic-$i\
--availability-set talos-controlplane-av-set \
--no-wait
done# Create worker node az vm create \
--name talos-worker-0 \
--image talos \
--vnet-name talos-vnet \
--subnet talos-subnet \
--custom-data ./join.yaml \
-g $GROUP\
--admin-username talos \
--generate-ssh-keys \
--verbose \
--boot-diagnostics-storage $STORAGE_ACCOUNT\
--nsg talos-sg \
--os-disk-size-gb 20\
--no-wait
# NOTES:# `--admin-username` and `--generate-ssh-keys` are required by the az cli,# but are not actually used by talos# `--os-disk-size-gb` is the backing disk for Kubernetes and any workload containers# `--boot-diagnostics-storage` is to enable console output which may be necessary# for troubleshooting
Retrieve the kubeconfig
You should now be able to interact with your cluster with talosctl.
We will need to discover the public IP for our first control plane node first.
In this guide we will create an HA Kubernetes cluster with 1 worker node.
We assume an existing Space, and some familiarity with DigitalOcean.
If you need more information on DigitalOcean specifics, please see the official DigitalOcean documentation.
Create the Image
First, download the DigitalOcean image from a Talos release.
Using an upload method of your choice (doctl does not have Spaces support), upload the image to a space.
Now, create an image using the URL of the uploaded image:
We will need the IP of the load balancer.
Using the ID of the load balancer, run:
doctl compute load-balancer get --format IP <load balancer ID>
Save it, as we will need it in the next step.
Create the Machine Configuration Files
Generating Base Configurations
Using the DNS name of the loadbalancer created earlier, generate the base configuration files for the Talos machines:
$ talosctl gen config talos-k8s-digital-ocean-tutorial https://<load balancer IP or DNS>:<port>
created init.yaml
created controlplane.yaml
created join.yaml
created talosconfig
At this point, you can modify the generated configs to your liking.
Validate the Configuration Files
$ talosctl validate --config init.yaml --mode cloud
init.yaml is valid for cloud mode
$ talosctl validate --config controlplane.yaml --mode cloud
controlplane.yaml is valid for cloud mode
$ talosctl validate --config join.yaml --mode cloud
join.yaml is valid for cloud mode
Note: Although SSH is not used by Talos, DigitalOcean still requires that an SSH key be associated with the droplet.
Create a dummy key that can be used to satisfy this requirement.
Create the Remaining Control Plane Nodes
Run the following twice, to give ourselves three total control plane nodes:
In this guide, we will create an HA Kubernetes cluster in GCP with 1 worker node.
We will assume an existing Cloud Storage bucket, and some familiarity with Google Cloud.
If you need more information on Google Cloud specifics, please see the official Google documentation.
Environment Setup
We’ll make use of the following environment variables throughout the setup.
Edit the variables below with your correct information.
# Storage account to useexportSTORAGE_BUCKET="StorageBucketName"# RegionexportREGION="us-central1"
Create the Image
First, download the Google Cloud image from a Talos release.
These images are called gcp.tar.gz.
Upload the Image
Once you have downloaded the image, you can upload it to your storage bucket with:
Once the image is prepared, we’ll want to work through setting up the network.
Issue the following to create a firewall, load balancer, and their required components.
In this guide we will create a Kubernetes cluster in Docker, using a containerized version of Talos.
Running Talos in Docker is intended to be used in CI pipelines, and local testing when you need a quick and easy cluster.
Furthermore, if you are running Talos in production, it provides an excellent way for developers to develop against the same version of Talos.
Requirements
The follow are requirements for running Talos in Docker:
Once the cluster is available, you can make use of talosctl and kubectl to interact with the cluster.
For example, to view current running containers, run talosctl containers for a list of containers in the system namespace, or talosctl containers -k for the k8s.io namespace.
To view the logs of a container, use talosctl logs <container> or talosctl logs -k <container>.
Cleaning Up
To cleanup, run:
talosctl cluster destroy
5.2 - Firecracker
In this guide we will create a Kubernetes cluster using Firecracker.
Requirements
Linux
a kernel with
KVM enabled (/dev/kvm must exist)
CONFIG_NET_SCH_NETEM enabled
CONFIG_NET_SCH_INGRESS enabled
at least CAP_SYS_ADMIN and CAP_NET_ADMIN capabilities
go get -d github.com/awslabs/tc-redirect-tap/cmd/tc-redirect-tap
cd$GOPATH/src/github.com/awslabs/tc-redirect-tap
make all
sudo cp tc-redirect-tap /opt/cni/bin
Note: if $GOPATH is not set, it defaults to ~/go.
Install Talos kernel and initramfs
Firecracker provisioner depends on Talos uncompressed kernel (vmlinuz) and initramfs (initramfs.xz).
These files can be downloaded from the Talos release:
Once the above finishes successfully, your talosconfig(~/.talos/config) will be configured to point to the new cluster.
Retrieve and Configure the kubeconfig
talosctl kubeconfig .
Using the Cluster
Once the cluster is available, you can make use of talosctl and kubectl to interact with the cluster.
For example, to view current running containers, run talosctl containers for a list of containers in the system namespace, or talosctl containers -k for the k8s.io namespace.
To view the logs of a container, use talosctl logs <container> or talosctl logs -k <container>.
A bridge interface will be created, and assigned the default IP 10.5.0.1.
Each node will be directly accessible on the subnet specified at cluster creation time.
A loadbalancer runs on 10.5.0.1 by default, which handles loadbalancing for the Talos, and Kubernetes APIs.
You can see a summary of the cluster state by running:
$ talosctl cluster show --provisioner firecracker
PROVISIONER firecracker
NAME talos-default
NETWORK NAME talos-default
NETWORK CIDR 10.5.0.0/24
NETWORK GATEWAY 10.5.0.1
NETWORK MTU 1500NODES:
NAME TYPE IP CPU RAM DISK
talos-default-master-1 Init 10.5.0.2 1.00 1.6 GB 4.3 GB
talos-default-master-2 ControlPlane 10.5.0.3 1.00 1.6 GB 4.3 GB
talos-default-master-3 ControlPlane 10.5.0.4 1.00 1.6 GB 4.3 GB
talos-default-worker-1 Join 10.5.0.5 1.00 1.6 GB 4.3 GB
Note: In that case that the host machine is rebooted before destroying the cluster, you may need to manually remove ~/.talos/clusters/talos-default.
Manual Clean Up
The talosctl cluster destroy command depends heavily on the clusters state directory.
It contains all related information of the cluster.
The PIDs and network associated with the cluster nodes.
If you happened to have deleted the state folder by mistake or you would like to cleanup
the environment, here are the steps how to do it manually:
Stopping VMs
Find the process of firecracker --api-sock execute:
ps -elf | grep '[f]irecracker --api-sock'
To stop the VMs manually, execute:
sudo kill -s SIGTERM <PID>
Example output, where VMs are running with PIDs 158065 and 158216
This is more tricky part as if you have already deleted the state folder.
If you didn’t then it is written in the state.yaml in the
/root/.talos/clusters/<cluster-name> directory.
go get -d github.com/awslabs/tc-redirect-tap/cmd/tc-redirect-tap
cd$GOPATH/src/github.com/awslabs/tc-redirect-tap
make all
sudo cp tc-redirect-tap /opt/cni/bin
Note: if $GOPATH is not set, it defaults to ~/go.
Install Talos kernel and initramfs
QEMU provisioner depends on Talos kernel (vmlinuz) and initramfs (initramfs.xz).
These files can be downloaded from the Talos release:
Once the above finishes successfully, your talosconfig(~/.talos/config) will be configured to point to the new cluster.
Retrieve and Configure the kubeconfig
talosctl -n 10.5.0.2 kubeconfig .
Using the Cluster
Once the cluster is available, you can make use of talosctl and kubectl to interact with the cluster.
For example, to view current running containers, run talosctl containers for a list of containers in the system namespace, or talosctl containers -k for the k8s.io namespace.
To view the logs of a container, use talosctl logs <container> or talosctl logs -k <container>.
A bridge interface will be created, and assigned the default IP 10.5.0.1.
Each node will be directly accessible on the subnet specified at cluster creation time.
A loadbalancer runs on 10.5.0.1 by default, which handles loadbalancing for the Talos, and Kubernetes APIs.
You can see a summary of the cluster state by running:
$ talosctl cluster show --provisioner qemu
PROVISIONER qemu
NAME talos-default
NETWORK NAME talos-default
NETWORK CIDR 10.5.0.0/24
NETWORK GATEWAY 10.5.0.1
NETWORK MTU 1500NODES:
NAME TYPE IP CPU RAM DISK
talos-default-master-1 Init 10.5.0.2 1.00 1.6 GB 4.3 GB
talos-default-master-2 ControlPlane 10.5.0.3 1.00 1.6 GB 4.3 GB
talos-default-master-3 ControlPlane 10.5.0.4 1.00 1.6 GB 4.3 GB
talos-default-worker-1 Join 10.5.0.5 1.00 1.6 GB 4.3 GB
Note: In that case that the host machine is rebooted before destroying the cluster, you may need to manually remove ~/.talos/clusters/talos-default.
Manual Clean Up
The talosctl cluster destroy command depends heavily on the clusters state directory.
It contains all related information of the cluster.
The PIDs and network associated with the cluster nodes.
If you happened to have deleted the state folder by mistake or you would like to cleanup
the environment, here are the steps how to do it manually:
Remove VM Launchers
Find the process of talosctl qemu-launch:
ps -elf | grep 'talosctl qemu-launch'
To remove the VMs manually, execute:
sudo kill -s SIGTERM <PID>
Example output, where VMs are running with PIDs 157615 and 157617
This is more tricky part as if you have already deleted the state folder.
If you didn’t then it is written in the state.yaml in the
~/.talos/clusters/<cluster-name> directory.
Static addressing is comprised of specifying cidr, routes ( remember to add your default gateway ), and interface.
Most likely you’ll also want to define the nameservers so you have properly functioning DNS.
In some environments you may need to set additional addresses on an interface.
In the following example, we set two additional addresses on the loopback interface.
Create cluster like normal and see that metrics are now present on this port:
$ curl 127.0.0.1:11234/v1/metrics
# HELP container_blkio_io_service_bytes_recursive_bytes The blkio io service bytes recursive# TYPE container_blkio_io_service_bytes_recursive_bytes gaugecontainer_blkio_io_service_bytes_recursive_bytes{container_id="0677d73196f5f4be1d408aab1c4125cf9e6c458a4bea39e590ac779709ffbe14",device="/dev/dm-0",major="253",minor="0",namespace="k8s.io",op="Async"}0container_blkio_io_service_bytes_recursive_bytes{container_id="0677d73196f5f4be1d408aab1c4125cf9e6c458a4bea39e590ac779709ffbe14",device="/dev/dm-0",major="253",minor="0",namespace="k8s.io",op="Discard"}0...
...
6.3 - Configuring Corporate Proxies
Appending the Certificate Authority of MITM Proxies
Put into each machine the PEM encoded certificate:
In this guide we will create a set of local caching Docker registry proxies to minimize local cluster startup time.
When running Talos locally, pulling images from Docker registries might take a significant amount of time.
We spin up local caching pass-through registries to cache images and configure a local Talos cluster to use those proxies.
A similar approach might be used to run Talos in production in air-gapped environments.
It can be also used to verify that all the images are available in local registries.
Video Walkthrough
To see a live demo of this writeup, see the video below:
Requirements
The follow are requirements for creating the set of caching proxies:
Talos pulls from docker.io, k8s.gcr.io, gcr.io and quay.io by default.
If your configuration is different, you might need to modify the commands below:
Note: Proxies are started as docker containers, and they’re automatically configured to start with Docker daemon.
Please note that quay.io proxy doesn’t support recent Docker image schema, so we run older registry image version (2.5).
As a registry container can only handle a single upstream Docker registry, we launch a container per upstream, each on its own
host port (5000, 5001, 5002).
Using Caching Registries with firecracker Local Cluster
With a firecracker local cluster, a bridge interface is created on the host.
As registry containers expose their ports on the host, we can use bridge IP to direct proxy requests.
The Talos local cluster should now start pulling via caching registries.
This can be verified via registry logs, e.g. docker logs -f registry-docker.io.
The first time cluster boots, images are pulled and cached, so next cluster boot should be much faster.
Note: 10.5.0.1 is a bridge IP with default network (10.5.0.0/24), if using custom --cidr, value should be adjusted accordingly.
Using Caching Registries with docker Local Cluster
With a docker local cluster we can use docker bridge IP, default value for that IP is 172.17.0.1.
On Linux, the docker bridge address can be inspected with ip addr show docker0.
Note: Removing docker registry containers also removes the image cache.
So if you plan to use caching registries, keep the containers running.
6.5 - Configuring the Cluster Endpoint
In this section, we will step through the configuration of a Talos based Kubernetes cluster.
There are three major components we will configure:
apid and talosctl
the master nodes
the worker nodes
Talos enforces a high level of security by using mutual TLS for authentication and authorization.
We recommend that the configuration of Talos be performed by a cluster owner.
A cluster owner should be a person of authority within an organization, perhaps a director, manager, or senior member of a team.
They are responsible for storing the root CA, and distributing the PKI for authorized cluster administrators.
Recommended settings
Talos runs great out of the box, but if you tweak some minor settings it will make your life
a lot easier in the future.
This is not a requirement, but rather a document to explain some key settings.
Endpoint
To configure the talosctl endpoint, it is recommended you use a resolvable DNS name.
This way, if you decide to upgrade to a multi-controlplane cluster you only have to add the ip adres to the hostname configuration.
The configuration can either be done on a Loadbalancer, or simply trough DNS.
For example:
This is in the config file for the cluster e.g. init.yaml, controlplane.yaml and join.yaml.
for more details, please see: v1alpha1 endpoint configuration
If you have a DNS name as the endpoint, you can upgrade your talos cluster with multiple controlplanes in the future (if you don’t have a multi-controlplane setup from the start)
Using a DNS name generates the corresponding Certificates (Kubernetes and Talos) for the correct hostname.
6.6 - Customizing the Kernel
FROM scratch AS customizationCOPY --from=<custom kernel image> /lib/modules /lib/modules
FROM docker.io/andrewrynhard/installer:latestCOPY --from=<custom kernel image> /boot/vmlinuz /usr/install/vmlinuz
Note: You can use the --squash flag to create smaller images.
Now that we have a custom installer we can build Talos for the specific platform we wish to deploy to.
6.7 - Customizing the Root Filesystem
The installer image contains ONBUILD instructions that handle the following:
the decompression, and unpacking of the initramfs.xz
the unsquashing of the rootfs
the copying of new rootfs files
the squashing of the new rootfs
and the packing, and compression of the new initramfs.xz
When used as a base image, the installer will perform the above steps automatically with the requirement that a customization stage be defined in the Dockerfile.
For example, say we have an image that contains the contents of a library we wish to add to the Talos rootfs.
We need to define a stage with the name customization:
FROM scratch AS customizationCOPY --from=<name|index> <src> <dest>
Using a multi-stage Dockerfile we can define the customization stage and build FROM the installer image:
FROM scratch AS customizationCOPY --from=<name|index> <src> <dest>
FROM docker.io/autonomy/installer:latest
When building the image, the customization stage will automatically be copied into the rootfs.
The customization stage is not limited to a single COPY instruction.
In fact, you can do whatever you would like in this stage, but keep in mind that everything in / will be copied into the rootfs.
Note: <dest> is the path relative to the rootfs that you wish to place the contents of <src>.
This will perform a rm -rf on the specified paths relative to the rootfs.
Note: RM must be a whitespace delimited list.
The resulting image can be used to:
generate an image for any of the supported providers
perform bare-metall installs
perform upgrades
We will step through common customizations in the remainder of this section.
6.8 - Managing PKI
Generating an Administrator Key Pair
In order to create a key pair, you will need the root CA.
Save the the CA public key, and CA private key as ca.crt, and ca.key respectively.
Now, run the following commands to generate a certificate:
talosctl gen key --name admin
talosctl gen csr --key admin.key --ip 127.0.0.1
talosctl gen crt --ca ca --csr admin.csr --name admin
Now, base64 encode admin.crt, and admin.key:
cat admin.crt | base64
cat admin.key | base64
You can now set the crt and key fields in the talosconfig to the base64 encoded strings.
Renewing an Expired Administrator Certificate
In order to renew the certificate, you will need the root CA, and the admin private key.
The base64 encoded key can be found in any one of the control plane node’s configuration file.
Where it is exactly will depend on the specific version of the configuration file you are using.
Save the the CA public key, CA private key, and admin private key as ca.crt, ca.key, and admin.key respectively.
Now, run the following commands to generate a certificate:
talosctl gen csr --key admin.key --ip 127.0.0.1
talosctl gen crt --ca ca --csr admin.csr --name admin
You should see admin.crt in your current directory.
Now, base64 encode admin.crt:
cat admin.crt | base64
You can now set the certificate in the talosconfig to the base64 encoded string.
6.9 - Resetting a Machine
From time to time, it may be beneficial to reset a Talos machine to its “original” state.
Bear in mind that this is a destructive action for the given machine.
Doing this means removing the machine from Kubernetes, Etcd (if applicable), and clears any data on the machine that would normally persist a reboot.
The API command for doing this is talosctl reset.
There are a couple of flags as part of this command:
Flags:
--graceful if true, attempt to cordon/drain node and leave etcd (if applicable)(default true) --reboot if true, reboot the node after resetting instead of shutting down
The graceful flag is especially important when considering HA vs. non-HA Talos clusters.
If the machine is part of an HA cluster, a normal, graceful reset should work just fine right out of the box as long as the cluster is in a good state.
However, if this is a single node cluster being used for testing purposes, a graceful reset is not an option since Etcd cannot be “left” if there is only a single member.
In this case, reset should be used with --graceful=false to skip performing checks that would normally block the reset.
6.10 - Upgrading Kubernetes
Video Walkthrough
To see a live demo of this writeup, see the video below:
Kubelet Image
In Kubernetes 1.19, the official hyperkube image was removed.
This means that in order to upgrade Kubernetes, Talos users will have to change the command, and image fields of each control plane component.
The kubelet image will also have to be updated, if you wish to specify the kubelet image explicitly.
The default used by Talos is sufficient in most cases.
Kubeconfig
In order to edit the control plane, we will need a working kubectl config.
If you don’t already have one, you can get one by running:
talosctl --nodes <master node> kubeconfig
Automated Kubernetes Upgrade
In Talos v0.6.1 we introduced the upgrade-k8s command in talosctl.
This command can be used to automate the Kubernetes upgrade process.
For example, to upgrade from Kubernetes v1.18.6 to v1.19.0 run:
$ talosctl --nodes <master node> upgrade-k8s --from 1.18.6 --to 1.19.0
updating pod-checkpointer grace period to "0m"sleeping 5m0s to let the pod-checkpointer self-checkpoint be updated
temporarily taking "kube-apiserver" out of pod-checkpointer control
updating daemonset "kube-apiserver" to version "1.19.0"updating daemonset "kube-controller-manager" to version "1.19.0"updating daemonset "kube-scheduler" to version "1.19.0"updating daemonset "kube-proxy" to version "1.19.0"updating pod-checkpointer grace period to "5m0s"
Manual Kubernetes Upgrade
Kubernetes can be upgraded manually as well by following the steps outlined below.
They are equivalent to the steps performed by the talosctl upgrade-k8s command.
pod-checkpointer
Talos runs pod-checkpointer component which helps to recover control plane components (specifically, API server) if control plane is not healthy.
However, the way checkpoints interact with API server upgrade may make an upgrade take a lot longer due to a race condition on API server listen port.
In order to speed up upgrades, first lower pod-checkpointer grace period to zero (kubectl -n kube-system edit daemonset pod-checkpointer), change:
The Talos team now maintains an image for the kubelet that should be used starting with Kubernetes 1.19.
The image for this release is docker.io/autonomy/kubelet:v1.19.0.
To explicitly set the image, we can use the official documentation.
For example:
To see a live demo of this writeup, see the video below:
Talos
In an effort to create more production ready clusters, Talos will now taint control plane nodes as unschedulable.
This means that any application you might have deployed must tolerate this taint if you intend on running the application on control plane nodes.
Another feature you will notice is the automatic uncordoning of nodes that have been upgraded.
Talos will now uncordon a node if the cordon was initiated by the upgrade process.
Talosctl
The talosctl CLI now requires an explicit set of nodes.
This can be configured with talos config nodes or set on the fly with talos --nodes.
7 - Reference
7.1 - Configuration
Package v1alpha1 configuration file contains all the options available for configuring a machine.
We can generate the files using talosctl.
This configuration is enough to get started in most cases, however it can be customized as needed.
Indicates whether to pull the machine config upon every boot.
Type: bool
Valid Values:
true
yes
false
no
machine
Provides machine specific configuration options.
Type: MachineConfig
cluster
Provides cluster specific configuration options.
Type: ClusterConfig
MachineConfig
type
Defines the role of the machine within the cluster.
Init
Init node type designates the first control plane node to come up.
You can think of it like a bootstrap node.
This node will perform the initial steps to bootstrap the cluster – generation of TLS assets, starting of the control plane, etc.
Control Plane
Control Plane node type designates the node as a control plane member.
This means it will host etcd along with the Kubernetes master components such as API Server, Controller Manager, Scheduler.
Worker
Worker node type designates the node as a worker node.
This means it will be an available compute node for scheduling workloads.
Type: string
Valid Values:
init
controlplane
join
token
The token is used by a machine to join the PKI of the cluster.
Using this token, a machine will create a certificate signing request (CSR), and request a certificate that will be used as its’ identity.
Type: string
Examples:
token: 328hom.uqjzh6jnn2eie9oi
Warning: It is important to ensure that this token is correct since a machine’s certificate has a short TTL by default
ca
The root certificate authority of the PKI.
It is composed of a base64 encoded crt and key.
Extra certificate subject alternative names for the machine’s certificate.
By default, all non-loopback interface IPs are automatically added to the certificate’s SANs.
Used to partition, format and mount additional disks.
Since the rootfs is read only with the exception of /var, mounts are only valid if they are under /var.
Note that the partitioning and formating is done only once, if and only if no existing partitions are found.
If size: is omitted, the partition is sized to occupy full disk.
Allows the addition of user specified files.
The value of op can be create, overwrite, or append.
In the case of create, path must not exist.
In the case of overwrite, and append, path must be a valid file.
If an op value of append is used, the existing file will be appended.
Note that the file contents are not required to be base64 encoded.
The env field allows for the addition of environment variables to a machine.
All environment variables are set on the machine in addition to every service.
Type: Env
Valid Values:
GRPC_GO_LOG_VERBOSITY_LEVEL
GRPC_GO_LOG_SEVERITY_LEVEL
http_proxy
https_proxy
no_proxy
Examples:
env:
GRPC_GO_LOG_VERBOSITY_LEVEL: "99"GRPC_GO_LOG_SEVERITY_LEVEL: info
https_proxy: http://SERVER:PORT/
Used to configure the machine’s container image registry mirrors.
Automatically generates matching CRI configuration for registry mirrors.
Section mirrors allows to redirect requests for images to non-default registry,
which might be local registry or caching mirror.
Section config provides a way to authenticate to the registry with TLS client
identity, provide registry CA, or authentication information.
Authentication information has same meaning with the corresponding field in .docker/config.json.
interfaces is used to define the network interface configuration.
By default all network interfaces will attempt a DHCP discovery.
This can be further tuned through this configuration parameter.
machine.network.interfaces.interface
This is the interface name that should be configured.
machine.network.interfaces.cidr
cidr is used to specify a static IP address to the interface.
This should be in proper CIDR notation ( 192.168.2.5/24 ).
Note: This option is mutually exclusive with DHCP.
machine.network.interfaces.dhcp
dhcp is used to specify that this device should be configured via DHCP.
The following DHCP options are supported:
OptionClasslessStaticRoute
OptionDomainNameServer
OptionDNSDomainSearchList
OptionHostName
Note: This option is mutually exclusive with CIDR.
machine.network.interfaces.ignore
ignore is used to exclude a specific interface from configuration.
This parameter is optional.
machine.network.interfaces.dummy
dummy is used to specify that this interface should be a virtual-only, dummy interface.
This parameter is optional.
machine.network.interfaces.routes
routes is used to specify static routes that may be necessary.
This parameter is optional.
Routes can be repeated and includes a Network and Gateway field.
Type: array
nameservers
Used to statically set the nameservers for the host.
Defaults to 1.1.1.1 and 8.8.8.8
Type: array
extraHostEntries
Allows for extra entries to be added to /etc/hosts file
Type: array
Examples:
extraHostEntries:
- ip: 192.168.1.100aliases:
- test
- test.domain.tld
InstallConfig
disk
The disk used to install the bootloader, and ephemeral partitions.
Type: string
Examples:
/dev/sda
/dev/nvme0
extraKernelArgs
Allows for supplying extra kernel args to the bootloader config.
Type: array
Examples:
extraKernelArgs:
- a=b
image
Allows for supplying the image used to perform the installation.
Type: string
Examples:
image: docker.io/<org>/installer:latest
bootloader
Indicates if a bootloader should be installed.
Type: bool
Valid Values:
true
yes
false
no
wipe
Indicates if zeroes should be written to the disk before performing and installation.
Defaults to true.
Type: bool
Valid Values:
true
yes
false
no
force
Indicates if filesystems should be forcefully created.
Type: bool
Valid Values:
true
yes
false
no
TimeConfig
servers
Specifies time (ntp) servers to use for setting system time.
Defaults to pool.ntp.org
Note: This parameter only supports a single time server
Type: array
RegistriesConfig
mirrors
Specifies mirror configuration for each registry.
This setting allows to use local pull-through caching registires,
air-gapped installations, etc.
Registry name is the first segment of image identifier, with ‘docker.io’
being default one.
Name ‘*’ catches any registry names not specified explicitly.
Type: map
config
Specifies TLS & auth configuration for HTTPS image registries.
Mutual TLS can be enabled with ‘clientIdentity’ option.
TLS configuration can be skipped if registry has trusted
server certificate.
Type: map
PodCheckpointer
image
The image field is an override to the default pod-checkpointer image.
Type: string
CoreDNS
image
The image field is an override to the default coredns image.
Type: string
Endpoint
ControlPlaneConfig
endpoint
Endpoint is the canonical controlplane endpoint, which can be an IP address or a DNS hostname.
It is single-valued, and may optionally include a port number.
Type: Endpoint
Examples:
https://1.2.3.4:443
localAPIServerPort
The port that the API server listens on internally.
This may be different than the port portion listed in the endpoint field above.
The default is 6443.
Type: int
APIServerConfig
image
The container image used in the API server manifest.
Type: string
extraArgs
Extra arguments to supply to the API server.
Type: map
certSANs
Extra certificate subject alternative names for the API server’s certificate.
Type: array
ControllerManagerConfig
image
The container image used in the controller manager manifest.
Type: string
extraArgs
Extra arguments to supply to the controller manager.
Type: map
ProxyConfig
image
The container image used in the kube-proxy manifest.
Type: string
mode
proxy mode of kube-proxy.
By default, this is ‘iptables’.
Type: string
extraArgs
Extra arguments to supply to kube-proxy.
Type: map
SchedulerConfig
image
The container image used in the scheduler manifest.
Type: string
extraArgs
Extra arguments to supply to the scheduler.
Type: map
EtcdConfig
image
The container image used to create the etcd service.
Type: string
ca
The ca is the root certificate authority of the PKI.
It is composed of a base64 encoded crt and key.
The CNI used.
Composed of “name” and “url”.
The “name” key only supports upstream bootkube options of “flannel” or “custom”.
URLs is only used if name is equal to “custom”.
URLs should point to a single yaml file that will get deployed.
Empty struct or any other name will default to bootkube’s flannel.
The domain used by Kubernetes DNS.
The default is cluster.local
Type: string
Examples:
cluser.local
podSubnets
The pod subnet CIDR.
Type: array
Examples:
podSubnets:
- 10.244.0.0/16
serviceSubnets
The service subnet CIDR.
Type: array
Examples:
serviceSubnets:
- 10.96.0.0/12
CNIConfig
name
Name of CNI to use.
Type: string
urls
URLs containing manifests to apply for CNI.
Type: array
AdminKubeconfigConfig
certLifetime
Admin kubeconfig certificate lifetime (default is 1 year).
Field format accepts any Go time.Duration format (‘1h’ for one hour, ‘10m’ for ten minutes).
Type: Duration
MachineDisk
device
The name of the disk to use.
Type: string
partitions
A list of partitions to create on the disk.
Type: array
DiskPartition
size
The size of the partition in bytes. If size: is omitted, the partition is sized to occupy the full disk.
Type: uint
mountpoint
Where to mount the partition.
Type: string
MachineFile
content
The contents of file.
Type: string
permissions
The file’s permissions in octal.
Type: FileMode
path
The path of the file.
Type: string
op
The operation to use
Type: string
Valid Values:
create
append
ExtraHost
ip
The IP of the host.
Type: string
aliases
The host alias.
Type: array
Device
interface
The interface name.
Type: string
cidr
The CIDR to use.
Type: string
routes
A list of routes associated with the interface.
Type: array
bond
Bond specific options.
Type: Bond
vlans
VLAN specific options.
Type: array
mtu
The interface’s MTU.
Type: int
dhcp
Indicates if DHCP should be used.
Type: bool
ignore
Indicates if the interface should be ignored.
Type: bool
dummy
Indicates if the interface is a dummy interface.
Type: bool
Bond
interfaces
The interfaces that make up the bond.
Type: array
arpIPTarget
A bond option.
Please see the official kernel documentation.
Type: array
mode
A bond option.
Please see the official kernel documentation.
Type: string
xmitHashPolicy
A bond option.
Please see the official kernel documentation.
Type: string
lacpRate
A bond option.
Please see the official kernel documentation.
Type: string
adActorSystem
A bond option.
Please see the official kernel documentation.
Type: string
arpValidate
A bond option.
Please see the official kernel documentation.
Type: string
arpAllTargets
A bond option.
Please see the official kernel documentation.
Type: string
primary
A bond option.
Please see the official kernel documentation.
Type: string
primaryReselect
A bond option.
Please see the official kernel documentation.
Type: string
failOverMac
A bond option.
Please see the official kernel documentation.
Type: string
adSelect
A bond option.
Please see the official kernel documentation.
Type: string
miimon
A bond option.
Please see the official kernel documentation.
Type: uint32
updelay
A bond option.
Please see the official kernel documentation.
Type: uint32
downdelay
A bond option.
Please see the official kernel documentation.
Type: uint32
arpInterval
A bond option.
Please see the official kernel documentation.
Type: uint32
resendIgmp
A bond option.
Please see the official kernel documentation.
Type: uint32
minLinks
A bond option.
Please see the official kernel documentation.
Type: uint32
lpInterval
A bond option.
Please see the official kernel documentation.
Type: uint32
packetsPerSlave
A bond option.
Please see the official kernel documentation.
Type: uint32
numPeerNotif
A bond option.
Please see the official kernel documentation.
Type: uint8
tlbDynamicLb
A bond option.
Please see the official kernel documentation.
Type: uint8
allSlavesActive
A bond option.
Please see the official kernel documentation.
Type: uint8
useCarrier
A bond option.
Please see the official kernel documentation.
Type: bool
adActorSysPrio
A bond option.
Please see the official kernel documentation.
Type: uint16
adUserPortKey
A bond option.
Please see the official kernel documentation.
Type: uint16
peerNotifyDelay
A bond option.
Please see the official kernel documentation.
Type: uint32
Vlan
cidr
The CIDR to use.
Type: string
routes
A list of routes associated with the VLAN.
Type: array
dhcp
Indicates if DHCP should be used.
Type: bool
vlanId
The VLAN’s ID.
Type: uint16
Route
network
The route’s network.
Type: string
gateway
The route’s gateway.
Type: string
RegistryMirrorConfig
endpoints
List of endpoints (URLs) for registry mirrors to use.
Endpoint configures HTTP/HTTPS access mode, host name,
port and path (if path is not set, it defaults to /v2).
Type: array
RegistryConfig
tls
The TLS configuration for this registry.
Type: RegistryTLSConfig
auth
The auth configuration for this registry.
Type: RegistryAuthConfig
RegistryAuthConfig
username
Optional registry authentication.
The meaning of each field is the same with the corresponding field in .docker/config.json.
Type: string
password
Optional registry authentication.
The meaning of each field is the same with the corresponding field in .docker/config.json.
Type: string
auth
Optional registry authentication.
The meaning of each field is the same with the corresponding field in .docker/config.json.
Type: string
identityToken
Optional registry authentication.
The meaning of each field is the same with the corresponding field in .docker/config.json.
Type: string
RegistryTLSConfig
clientIdentity
Enable mutual TLS authentication with the registry.
Client certificate and key should be base64-encoded.
When interacting with Talos, the gRPC API endpoint you’re interact with directly is provided by apid. apid acts as the gateway for all component interactions and forwards the requests to routerd.
To run and operate a Kubernetes cluster a certain level of trust is required. Based on the concept of a ‘Root of Trust’, trustd is a simple daemon responsible for establishing trust within the system.
Implementation of eudev into machined. eudev is Gentoo’s fork of udev, systemd’s device file manager for the Linux kernel. It manages device nodes in /dev and handles all user space actions when adding or removing devices. To learn more see the Gentoo Wiki.
apid
When interacting with Talos, the gRPC api endpoint you will interact with directly is apid.
Apid acts as the gateway for all component interactions.
Apid provides a mechanism to route requests to the appropriate destination when running on a control plane node.
We’ll use some examples below to illustrate what apid is doing.
When a user wants to interact with a Talos component via talosctl, there are two flags that control the interaction with apid.
The -e | --endpoints flag is used to denote which Talos node ( via apid ) should handle the connection.
Typically this is a public facing server.
The -n | --nodes flag is used to denote which Talos node(s) should respond to the request.
If --nodes is not specified, the first endpoint will be used.
Note: Typically there will be an endpoint already defined in the Talos config file.
Optionally, nodes can be included here as well.
For example, if a user wants to interact with machined, a command like talosctl -e cluster.talos.dev memory may be used.
$ talosctl -e cluster.talos.dev memory
NODE TOTAL USED FREE SHARED BUFFERS CACHE AVAILABLE
cluster.talos.dev 7938176823901455337246571
In this case, talosctl is interacting with apid running on cluster.talos.dev and forwarding the request to the machined api.
If we wanted to extend our example to retrieve memory from another node in our cluster, we could use the command talosctl -e cluster.talos.dev -n node02 memory.
$ talosctl -e cluster.talos.dev -n node02 memory
NODE TOTAL USED FREE SHARED BUFFERS CACHE AVAILABLE
node02 7938176823901455337246571
The apid instance on cluster.talos.dev receives the request and forwards it to apid running on node02 which forwards the request to the machined api.
We can further extend our example to retrieve memory for all nodes in our cluster by appending additional -n node flags or using a comma separated list of nodes ( -n node01,node02,node03 ):
$ talosctl -e cluster.talos.dev -n node01 -n node02 -n node03 memory
NODE TOTAL USED FREE SHARED BUFFERS CACHE AVAILABLE
node01 793887140711374929457042node02 25784414408190796181384952589227492node03 257844183025518612549777254556
The apid instance on cluster.talos.dev receives the request and forwards is to node01, node02, and node03 which then forwards the request to their local machined api.
containerd
Containerd provides the container runtime to launch workloads on Talos as well as Kubernetes.
Talos services are namespaced under the system namespace in containerd whereas the Kubernetes services are namespaced under the k8s.io namespace.
machined
A common theme throughout the design of Talos is minimalism.
We believe strongly in the UNIX philosophy that each program should do one job well.
The init included in Talos is one example of this, and we are calling it “machined”.
We wanted to create a focused init that had one job - run Kubernetes.
To that extent, machined is relatively static in that it does not allow for arbitrary user defined services.
Only the services necessary to run Kubernetes and manage the node are available.
This includes:
Networkd handles all of the host level network configuration.
Configuration is defined under the networking key.
By default, we attempt to issue a DHCP request for every interface on the server.
This can be overridden by supplying one of the following kernel arguments:
talos.network.interface.ignore - specify a list of interfaces to skip discovery on
ip - ip=<client-ip>:<server-ip>:<gw-ip>:<netmask>:<hostname>:<device>:<autoconf>:<dns0-ip>:<dns1-ip>:<ntp0-ip> as documented in the kernel here
The Linux kernel included with Talos is configured according to the recommendations outlined in the Kernel Self Protection Project (KSSP).
trustd
Security is one of the highest priorities within Talos.
To run a Kubernetes cluster a certain level of trust is required to operate a cluster.
For example, orchestrating the bootstrap of a highly available control plane requires the distribution of sensitive PKI data.
To that end, we created trustd.
Based on the concept of a Root of Trust, trustd is a simple daemon responsible for establishing trust within the system.
Once trust is established, various methods become available to the trustee.
It can, for example, accept a write request from another node to place a file on disk.
Additional methods and capability will be added to the trustd component in support of new functionality in the rest of the Talos environment.
udevd
Udevd handles the kernel device notifications and sets up the necessary links in /dev.
8.2 - FAQs
How is Talos different from other container optimized Linux distros?
Talos shares a lot of attributes with other distros, but there are some important differences.
Talos integrates tightly with Kubernetes, and is not meant to be a general-purpose operating system.
The most important difference is that Talos is fully controlled by an API via a gRPC interface, instead of an ordinary shell.
We don’t ship SSH, and there is no console access.
Removing components such as these has allowed us to dramatically reduce the footprint of Talos, and in turn, improve a number of other areas like security, predictability, reliability, and consistency across platforms.
It’s a big change from how operating systems have been managed in the past, but we believe that API-driven OSes are the future.
Why no shell or SSH?
Since Talos is fully API-driven, all maintenance and debugging operations should be possible via the OS API.
We would like for Talos users to start thinking about what a “machine” is in the context of a Kubernetes cluster.
That is, that a Kubernetes cluster can be thought of as one massive machine, and the nodes are merely additional, undifferentiated resources.
We don’t want humans to focus on the nodes, but rather on the machine that is the Kubernetes cluster.
Should an issue arise at the node level, talosctl should provide the necessary tooling to assist in the identification, debugging, and remedation of the issue.
However, the API is based on the Principle of Least Privilege, and exposes only a limited set of methods.
We envision Talos being a great place for the application of control theory in order to provide a self-healing platform.
Why the name “Talos”?
Talos was an automaton created by the Greek God of the forge to protect the island of Crete.
He would patrol the coast and enforce laws throughout the land.
We felt it was a fitting name for a security focused operating system designed to run Kubernetes.