Files

Felix Nehrke 1f69c1578c Add longhorn distributed storage to the k3s-cluster

This change adds longhorn, an addition to Kubernetes that adds the
ability to use distributed storage over all nodes to the cluster.

Note, that I tried that in December already but due to very high load on
the machines I rolled _everything_ back. Though, it turned out that the
high load was not because of longhorn, but instead because of bad
configuration of the server, as described in the see-also commit.

Reference: https://longhorn.io/
Reference: https://longhorn.io/docs/1.10.1/deploy/install/install-with-helm/
See-also: 4b8a3d12c4 Use etcd instead of sqlite for k3s-server

2026-01-23 00:45:00 +01:00

6.5 KiB

Raw Blame History

Base Infrastructure

This project will set up a k3s Kubernetes cluster on Hetzner Cloud using OpenTofu and Ansible.

It is meant to set up my base infrastructure for the web. In particular to bootstrap required machines and networks, as well as installing a Kubernetes cluster and deploying a set of foundational services.

The system is intentionally split into two stages:

Infrastructure provisioning using OpenTofu (or Terraform)
Cluster and software installation using Ansible

The entire setup is idempotent, meaning it can be applied repeatedly and safely.

Table of Contents

TL;DR
Required Software
- Optional Tools
Secrets & Local Configuration
Infrastructure Provisioning (OpenTofu)
Cluster & Software Installation (Ansible)
Scaling the Cluster
Responsibilities

TL;DR

vim -o .envrc config.auto.tfvars # Add secrets from password manager
direnv allow
tofu init
tofu apply
until ansible -m ping all; do sleep 10; done # Wait until VMs are reachable
ansible-galaxy install -r requirements.yml
ansible-playbook site.yml

Required Software

The setup works on:

Debian
Ubuntu
macOS

Please install the following:

tofu or terraform
ansible
direnv
helm
helm-diff
python3-kubernetes (Debian/Ubuntu only)

Optional Tools

These tools improve maintenance and cluster operations:

k9s

Secrets & Local Configuration

The setup requires two files:

.envrc
config.auto.tfvars

These contain credentials, environment variables, and configuration values. Both files are stored securely in my password manager.

Templates are available in the repository:

After placing these files, enable them with:

direnv allow

Infrastructure Provisioning (OpenTofu)

OpenTofu provisions:

Kubernetes server and agent machines
Networking (public + private subnets)
Firewall rules
Routing between subnets
DNS records

The infrastructure is fully idempotent. You can re-run tofu apply at any time.

tofu init (1)
tofu apply (2)
until ansible -m ping all; do sleep 10; done (3)

1	Initialize modules
2	Apply infrastructure and generate `inventory.ini`
3	Wait until all VMs are reachable (may take up to 5 minutes)

Cluster & Software Installation (Ansible)

Ansible installs and maintains all cluster software, including:

Routing and SSH setup on servers
A full k3s Kubernetes cluster
Distributed block-storage via longhorn
Foundational cluster services

All playbooks are idempotent and can be safely re-run.

ansible-galaxy install -r requirements.yml (1)
ansible-playbook site.yml                  (2)

1	Install required Ansible collections
2	Install k3s and write kubeconfig to `~/.kube/config`

Running the playbook will overwrite ~/.kube/config. Backup your config if you manage multiple clusters.

The Kubernetes setup requires an inventory.ini file, which Tofu creates automatically. So, make sure to apply the infrastructure at least once before running Ansible.

Longhorn

The setup installs Longhorn, which provides a distributed block-storage system for the Kubernetes cluster.

Longhorn exposes a default storage class named longhorn. This storage class is backed by replicated volumes distributed across multiple nodes, reducing dependency on node-local ephemeral storage and allowing workloads to be rescheduled more reliably.

Longhorn also provides a web-based dashboard for inspecting volumes, replicas, and node health.

To access the dashboard, forward the service port:

kubectl port-forward -n longhorn-system --address 0.0.0.0 service/longhorn-frontend 8000:80

Then open http://localhost:8000/ in your browser.

Installed Foundational Services

cert-manager: This enables automatic issuance of TLS certificates. The certificates are issued via Let’s Encrypt with support for both the staging and production environments of it.
gitea: My personal favorite git-server.
concourse-ci: A powerful CI-service which I like to use to automate all kind of workloads.
snappass: A secure and reliable tool for sharing passwords.

Configured tags

The playbook has a couple of tags configured to restrict the execution scope.

You can restrict playbook scope to specific areas using --tags.

General tags

`init`	Full initial setup
`add-server`	Add a new k3s server node
`add-agent`	Add a new k3s agent node
`update`	Upgrade Kubernetes or system packages
`longhorn-compatible`	Ensure longhorn-compatibility
`longhorn`	Deploy longhorn
`config`	Update local kubeconfig
`k8s`	Deploy foundational services

Service-specific tags

`cert-manager`	Apply changes to the cert-manager including support for `Let’s Encrypt`
`gitea`	Apply changes to gitea
`concourse`	Apply changes to concourse
`snappass`	Apply changes to snappass

Scaling the Cluster

Increase

Adjust the number of servers/agents in config.auto.tfvars
Then rerun the Ansible playbook

Decrease

DO NOT reduce the agent count directly.

Open k9s
Navigate to :nodes
Select the agent with the highest numeric index
Drain it with kbd:[r]
After draining, delete it with kbd:[Ctrl + d]
Now decrease the agent count in config.auto.tfvars and run tofu apply

Responsibilities

OpenTofu

Provision machines for Kubernetes servers (public subnet)
Provision machines for Kubernetes agents (private subnet)
Create networking (public/private subnets + routing)
Manage firewall rules:
- ICMP
- Kubernetes API (6443)
- SSH (nonstandard port, usually 1022)
- HTTP/HTTPS (80, 443)
- Git SSH (22)
Manage DNS records

Ansible

Configure SSH access
Configure routing on all servers
Install and maintain k3s
Keep system software updated
Install longhorn
Deploy foundational services