rpi k8s Cluster Using Talos Linux

Motivation

I am writing this post paritally to distill my understanding of kubernetes as I undertake a journey to learn more about cloud security by standing up and auditing my own cluster, and paritially in hopes of providing some help for anyone who is thinking of setting up their own low-cost kubernetes cluster on bare metal Raspberry Pi 4s.

Besides providing a fun learning experience, I look forward to adding to the collection of self-hosted kubernetes tutorials on the internet because we all deserve to use services which are in our control.

What is Kubernetes?

Kubernetes is a container orchestration platform. We can further understand this concise definition by first discussing what containers are.

When people say container, they are typically referring to virtualization which takes place at the OS-level, rather than at the hardware level. Hardware virtualization is probably familiar to you if you’ve found this post and refers to the process of running a guest Operating System on top of a host Operating system. In Hardware virtualization, the host grants the guest operating system a simulated environment in which the guest operating system can interact with the hardware as though it were the main operating system installed on the device. There are a few caveats specifically regarding guest permissions to certain I/O devices, but for our purposes we can think of hardware virtualization as the creation of a virtual environment for a guest in which the guest thinks it’s program execution is taking place on bare metal.

While hardware virtualization has its benefits in terms of semi-isolation between the guest and host OS (exceptions include but are not limited to shared directories and security flaws in the virtualization program) The overhead necessary to run operating systems at once might not be necessary when the purpose of virtualization was to isolate a single program from the rest of your host machine.

This is where OS-level virtualization really shines. Instead of simulating an entire operating system for one potentially untrusted application, OS-level virtualization allows the host kernel to create a seperate userspace for the application called a container. This container does not need the overhead of an entire seperate OS to support it, but instead the access control provided by the kernel’s perspective of the programs userspace. This access control provided by the kernel allows the container to have access only to itself and any resources explicitly allocated to it.

A popular implementation of OS-level virtualization is Docker and it is really great for accomplishing the isolation and self-contained nature we desire.

But what about scalability? What if we wanted to guarantee a certain level of uptime by implementing redundancy of our containers across different locations or automatically restarting them when they have died? This is where container orchestration comes in.

Container orchestration refers to the process of orchestrating or administering individual containers through a single interface. In a system like kubernetes, rather than having to communicate with each server that is hosting a container, we can link these servers as worker nodes to our kubernetes cluster and only ever communicate with a subset of machines on the cluster collectively called the Control Plane. In our communication with the control plane, rather than using an imperative language, we can use a declarative language; expressing the state we would like to observe and leaving kubernetes to defining the control flow required to acheive the desired state.

Theres tons more to learn about kubernetes, as Worker Nodes and Control Planes barely scratch the surface. For our purposes, this information should be enough to get everything setup, but theres lots of room for configuration and hardening so be sure to checkout the docs to learn more about the various components that are deployed in a kubernetes cluster.

What is Talos Linux?

Traditionally, Kubernetes would be deployed on top of a host operating system. While still providing the isolation between environments and convenience to the system administrator by letting them manage the box remotely through ssh, if a device is being used exclusively for kubernetes, it makes sense to minimize the attack surface as much as possible by doing away with mutability originating from outside of the kubernetes communication channels. The aim of Talos Linux is to do just that. By running only the linux kernel and allowing device management to take place only through local access and remotely through the the talosctl, Talos effectively eliminates the need for management of the OS and allows system administrators to take an approach that focuses on management through the kubernetes control plane. This changes the job of the admin that previously comprised of managing both the individual operating systems and the kubernetes cluster, to only the management of the kubernetes cluster.

You can learn more about Talos here.

Why Talos

Many people choose to run k3s on raspberry pi’s, a project by Rancher Labs that aims to minimize the footprint of kubernetes by packaging core functionality up in a single binary and having the binary deployed on top of an operating system. I think this is a valid option for some users, and is a project that I definitely want to learn more about myself, but I chose Talos becuase of the potential to reduce the attack surface through the elimination of access to the operating system. Not only are things like ssh and bash disabled, but the entire filesystem is mounted by Talos as read only, which I think sounds awesome. The simplicity that Talos gives and abstratction it offers by letting the admin concern themself only with the kubernetes layer of computing sounds like a beautiful evolution of distributed computing that I really wanted to play around with.

Installation Instructions

Great, now that we have my preliminary thoughts and motivation out of the way, we can move onto actually setting up your own Talos-based Raspberry Pi 4 Kubernetes Cluster.

I will primarily be regurgiating the instructions on the documentation and sharing my thoughts and perspective on justification for certain steps along the way. If you’d like a more concise version of these instructions, I recommend checking out the official Documentation

First we need to download talosctl, a utility for accessing the talos API and managing the devices remotely. We can do that on the commandline with curl, using the -L flag to follow redirects, and the -o option to specify the output file.

curl -Lo /usr/local/bin/talosctl https://github.com/talos-systems/talos/releases/latest/download/talosctl-$(uname -s | tr "[:upper:]" "[:lower:]")-amd64

and then make it executable with chmod

chmod +x /usr/local/bin/talosctl

We are going to put it in our /usr/local/bin directory which is a place that we can store executables that belong to root but can be executed by non-privileged users.

Next, lets download the talos image into /tmp/ (an ephemeral directory cleared at reboot), and examine the sha512sum to verify its integrity.

curl -Lo /tmp/metal-rpi_4-arm64.img.xz https://github.com/talos-systems/talos/releases/latest/download/metal-rpi_4-arm64.img.xz

We also want to download the sha512sum.

curl -Lo /tmp/sha512sum.txt https://github.com/talos-systems/talos/releases/latest/download/sha512sum.txt

We can then verify that bit errors didn’t occur during transit by using:

sha512sum metal-rpi_4-arm64.img.xz && cat sha512sum.txt | grep metal-rpi_4-arm64.img.xz

You’ll want to compare the two hashes to make sure they are identical (this could also be acheived using a tool like diff and awk if we wanted to automate the process).

Its important to note that this hash doesn’t guarantee the origin of the data, just the integrity. The talos maintainer’s github accounts could have been compromised and used to push a malicious commit, or the executable and checksum could have been intercepted along the way and been substituted by a file and checksum of the adversaries choosing. In a production environment, simply downloading and comparing checksum does not protect our environment from being compromised by execution of the code, it’s purpose is only to check for transmission errors that occured in bit flips in the file.

Now that we have dispelled any false senses of security we may have been bothered by after verifying our checksum, we can move on to decompressing the image and flashing it onto our SD cards.

We can decompress the file using the xz tool and the -d (decompress) option.

xz -d metal-rpi_4-arm64

We can now flash onto the SD card by using the following command and replacing with the path of the usb device (commonly found under /dev but dependent on operating system).

The following command will use dd to copy if=<input file> into of=<output file> Make sure to repeat this step for each raspberry pi’s SD card.

dd if=metal-rpi_4-arm64.img of=<device path>

After the command has finished, we can remove the SD card and plug it into the target raspberry pi. Next you’ll want to connect the Pi to your monitor using HDMI and power it on. This will allow you to view the status of the boot process and the command needed to load a configuration file onto the Pi.

Once the boot process has finished you’ll see something similar to the following line on the screen:

[talos] task loadConfig (1/1): talosctl apply-config --insecure --nodes <node ip> --cert-fingerprint '<cert fingerprint>' --interactive

On your administration machine, you can create the configuration file by issuing the following commands while substituting your chosen cluster name and endpoint.

talosctl gen config "<Cluster Name>" "https://<endpoint>:6443"

After creating the configuration files, you can load them onto the machines in your cluster by using

talosctl apply-config --insecure --nodes <node ip> --cert-fingerprint '<cert fingerprint>' --file <config.yaml>

Make sure to use the controlplane.yaml files for machines that will be used to orchestrate tasks, and worker.yaml for machine that will execute jobs and host containerized services

Now that we have configured our cluster, we need to configure our talosctl on the client that we will be using to connect to the cluster.

First we need to set the endpoints of our control plane machines. Make sure to substitute your control plane machine ip addresses in the field

talosctl --talosconfig=./talosconfig config endpoint <cp ip>

Next we want to set the default control plane node using the same IP as above

talosctl --talosconfig=./talosconfig config node 192.168.1.202

We can verify the addition of our node by using

talosctl --talosconfig=./talosconfig version

Notice that we needed to specify the configuration file to load the configuration. We can change our newly created config to the default configuration by using

talosctl config merge ./talosconfig

Once the configuration has been applied, we can get check that it was successful by requesting the kernel logs for our nodes

talosctl dmesg -f -n <node ip>

Next we will want to bootstrap kubernetes using the talosctl utility, specifying the control plane ip (this should only be done on one control plane server)

talosctl boostrap --nodes 192.168.1.202

This command will setup your etcd cluster, generate core assets, and launch the control plane components which we can begin to use to interface with the kubernetes cluster.

Next, we can transfer our talosctl config into our kubectl config using:

talosctl kubeconfig

finally, we can connect to our cluster and view the status of our nodes:

kubectl get nodes

Closing Thoughts

It is incredible how much configuration and tinkering is available to the administrator of a kubernetes cluster. This initial setup is all I have time for right now, but I hope to play around with the configuration and research harderning methods that I look forward to sharing here.