This change adds longhorn, an addition to Kubernetes that adds the
ability to use distributed storage over all nodes to the cluster.
Note, that I tried that in December already but due to very high load on
the machines I rolled _everything_ back. Though, it turned out that the
high load was not because of longhorn, but instead because of bad
configuration of the server, as described in the see-also commit.
Reference: https://longhorn.io/
Reference: https://longhorn.io/docs/1.10.1/deploy/install/install-with-helm/
See-also: 4b8a3d12c4 Use etcd instead of sqlite for k3s-server
6.5 KiB
Base Infrastructure
This project will set up a k3s Kubernetes cluster on Hetzner Cloud using OpenTofu and Ansible.
It is meant to set up my base infrastructure for the web. In particular to bootstrap required machines and networks, as well as installing a Kubernetes cluster and deploying a set of foundational services.
The system is intentionally split into two stages:
-
Infrastructure provisioning using OpenTofu (or Terraform)
-
Cluster and software installation using Ansible
The entire setup is idempotent, meaning it can be applied repeatedly and safely.
TL;DR
vim -o .envrc config.auto.tfvars # Add secrets from password manager
direnv allow
tofu init
tofu apply
until ansible -m ping all; do sleep 10; done # Wait until VMs are reachable
ansible-galaxy install -r requirements.yml
ansible-playbook site.yml
Required Software
Secrets & Local Configuration
The setup requires two files:
-
.envrc -
config.auto.tfvars
These contain credentials, environment variables, and configuration values. Both files are stored securely in my password manager.
|
Templates are available in the repository: |
After placing these files, enable them with:
direnv allow
Infrastructure Provisioning (OpenTofu)
OpenTofu provisions:
-
Kubernetes server and agent machines
-
Networking (public + private subnets)
-
Firewall rules
-
Routing between subnets
-
DNS records
The infrastructure is fully idempotent. You can re-run tofu apply at any time.
|
tofu init (1)
tofu apply (2)
until ansible -m ping all; do sleep 10; done (3)
| 1 | Initialize modules |
| 2 | Apply infrastructure and generate inventory.ini |
| 3 | Wait until all VMs are reachable (may take up to 5 minutes) |
Cluster & Software Installation (Ansible)
Ansible installs and maintains all cluster software, including:
-
Routing and SSH setup on servers
-
A full k3s Kubernetes cluster
-
Distributed block-storage via longhorn
-
Foundational cluster services
| All playbooks are idempotent and can be safely re-run. |
ansible-galaxy install -r requirements.yml (1)
ansible-playbook site.yml (2)
| 1 | Install required Ansible collections |
| 2 | Install k3s and write kubeconfig to ~/.kube/config |
Running the playbook will overwrite ~/.kube/config.
Backup your config if you manage multiple clusters.
|
The Kubernetes setup requires an inventory.ini file, which Tofu creates automatically.
So, make sure to apply the infrastructure at least once before running Ansible.
|
Longhorn
The setup installs Longhorn, which provides a distributed block-storage system for the Kubernetes cluster.
Longhorn exposes a default storage class named longhorn.
This storage class is backed by replicated volumes distributed across multiple nodes,
reducing dependency on node-local ephemeral storage and allowing workloads to be rescheduled more reliably.
Longhorn also provides a web-based dashboard for inspecting volumes, replicas, and node health.
To access the dashboard, forward the service port:
kubectl port-forward -n longhorn-system --address 0.0.0.0 service/longhorn-frontend 8000:80
Then open http://localhost:8000/ in your browser.
Installed Foundational Services
- cert-manager
-
This enables automatic issuance of TLS certificates. The certificates are issued via Let’s Encrypt with support for both the staging and production environments of it.
- gitea
-
My personal favorite git-server.
- concourse-ci
-
A powerful CI-service which I like to use to automate all kind of workloads.
- snappass
-
A secure and reliable tool for sharing passwords.
Configured tags
The playbook has a couple of tags configured to restrict the execution scope.
You can restrict playbook scope to specific areas using --tags.
init
|
Full initial setup |
add-server
|
Add a new k3s server node |
add-agent
|
Add a new k3s agent node |
update
|
Upgrade Kubernetes or system packages |
longhorn-compatible
|
Ensure longhorn-compatibility |
longhorn
|
Deploy longhorn |
config
|
Update local kubeconfig |
k8s
|
Deploy foundational services |
cert-manager
|
Apply changes to the cert-manager including support for |
gitea
|
Apply changes to gitea |
concourse
|
Apply changes to concourse |
snappass
|
Apply changes to snappass |
Scaling the Cluster
- Increase
-
Adjust the number of servers/agents in
config.auto.tfvars -
Then rerun the Ansible playbook
- Decrease
DO NOT reduce the agent count directly.
-
Open
k9s -
Navigate to
:nodes -
Select the agent with the highest numeric index
-
Drain it with kbd:[r]
-
After draining, delete it with kbd:[Ctrl + d]
-
Now decrease the agent count in
config.auto.tfvarsand runtofu apply
Responsibilities
- OpenTofu
-
-
Provision machines for Kubernetes servers (public subnet)
-
Provision machines for Kubernetes agents (private subnet)
-
Create networking (public/private subnets + routing)
-
Manage firewall rules:
-
ICMP
-
Kubernetes API (
6443) -
SSH (nonstandard port, usually
1022) -
HTTP/HTTPS (
80,443) -
Git SSH (
22)
-
-
Manage DNS records
-
- Ansible
-
-
Configure SSH access
-
Configure routing on all servers
-
Install and maintain k3s
-
Keep system software updated
-
Install longhorn
-
Deploy foundational services
-