Files
base-infra/README.adoc
Felix Nehrke 1f69c1578c Add longhorn distributed storage to the k3s-cluster
This change adds longhorn, an addition to Kubernetes that adds the
ability to use distributed storage over all nodes to the cluster.

Note, that I tried that in December already but due to very high load on
the machines I rolled _everything_ back. Though, it turned out that the
high load was not because of longhorn, but instead because of bad
configuration of the server, as described in the see-also commit.

Reference: https://longhorn.io/
Reference: https://longhorn.io/docs/1.10.1/deploy/install/install-with-helm/
See-also: 4b8a3d12c4 Use etcd instead of sqlite for k3s-server
2026-01-23 00:45:00 +01:00

241 lines
6.5 KiB
Plaintext

= Base Infrastructure
:icons: font
:source-highlighter: rouge
:toc: preamble
ifdef::env-github[]
:tip-caption: :bulb:
:note-caption: :point_up:
:important-caption: :heavy_exclamation_mark:
:caution-caption: :fire:
:warning-caption: :warning:
endif::[]
ifdef::env-github[]
NOTE: This is only *a mirror* from my https://gitea.nehrke.info/nemoinho/base-infra[personal git-server].
endif::[]
This project will set up a https://k3s.io/[k3s] Kubernetes cluster on https://www.hetzner.com/cloud[Hetzner Cloud] using OpenTofu and Ansible.
It is meant to set up my base infrastructure for the web.
In particular to bootstrap required machines and networks,
as well as installing a Kubernetes cluster and deploying a set of foundational services.
The system is intentionally split into two stages:
1. Infrastructure provisioning using https://opentofu.org/[OpenTofu] (or Terraform)
2. Cluster and software installation using Ansible
The entire setup is *idempotent*, meaning it can be applied repeatedly and safely.
== TL;DR
[source,bash]
----
vim -o .envrc config.auto.tfvars # Add secrets from password manager
direnv allow
tofu init
tofu apply
until ansible -m ping all; do sleep 10; done # Wait until VMs are reachable
ansible-galaxy install -r requirements.yml
ansible-playbook site.yml
----
== Required Software
The setup works on:
* Debian
* Ubuntu
* macOS
Please install the following:
* `tofu` or `terraform`
* `ansible`
* `direnv`
* https://helm.sh/docs/intro/install/[`helm`]
* https://github.com/databus23/helm-diff?tab=readme-ov-file#install[`helm-diff`]
* `python3-kubernetes` (Debian/Ubuntu only)
=== Optional Tools
These tools improve maintenance and cluster operations:
* `k9s`
== Secrets & Local Configuration
The setup requires two files:
* `.envrc`
* `config.auto.tfvars`
These contain credentials, environment variables, and configuration values.
Both files are stored securely in my password manager.
[TIP]
--
Templates are available in the repository:
* https://gitea.nehrke.info/nemoinho/base-infra/src/branch/main/.envrc.tpl[`.envrc`]
* https://gitea.nehrke.info/nemoinho/base-infra/src/branch/main/config.auto.tfvars.tpl[`config.auto.tfvars`]
--
After placing these files, enable them with:
[source,bash]
----
direnv allow
----
== Infrastructure Provisioning (OpenTofu)
OpenTofu provisions:
* Kubernetes server and agent machines
* Networking (public + private subnets)
* Firewall rules
* Routing between subnets
* DNS records
[NOTE]
The infrastructure is fully idempotent. You can re-run `tofu apply` at any time.
[source,bash]
----
tofu init # <1>
tofu apply # <2>
until ansible -m ping all; do sleep 10; done # <3>
----
<1> Initialize modules
<2> Apply infrastructure and generate `inventory.ini`
<3> Wait until all VMs are reachable (may take up to 5 minutes)
== Cluster & Software Installation (Ansible)
Ansible installs and maintains all cluster software, including:
* Routing and SSH setup on servers
* A full k3s Kubernetes cluster
* Distributed block-storage via https://longhorn.io/[longhorn]
* Foundational cluster services
[NOTE]
All playbooks are idempotent and can be safely re-run.
[source,bash]
----
ansible-galaxy install -r requirements.yml # <1>
ansible-playbook site.yml # <2>
----
<1> Install required Ansible collections
<2> Install k3s and write kubeconfig to `~/.kube/config`
[CAUTION]
Running the playbook will overwrite `~/.kube/config`.
Backup your config if you manage multiple clusters.
[NOTE]
The Kubernetes setup requires an `inventory.ini` file, which Tofu creates automatically.
So, make sure to apply the infrastructure at least once before running Ansible.
=== Longhorn
The setup installs https://longhorn.io/[Longhorn], which provides a distributed block-storage system for the Kubernetes cluster.
Longhorn exposes a default storage class named `longhorn`.
This storage class is backed by replicated volumes distributed across multiple nodes,
reducing dependency on node-local ephemeral storage and allowing workloads to be rescheduled more reliably.
Longhorn also provides a web-based dashboard for inspecting volumes, replicas, and node health.
To access the dashboard, forward the service port:
[source,bash]
kubectl port-forward -n longhorn-system --address 0.0.0.0 service/longhorn-frontend 8000:80
Then open http://localhost:8000/ in your browser.
=== Installed Foundational Services
https://cert-manager.io/docs/installation/helm[cert-manager]::
This enables automatic issuance of TLS certificates.
The certificates are issued via https://letsencrypt.org[Let's Encrypt] with support for both the staging and production environments of it.
https://about.gitea.com[gitea]::
My personal favorite git-server.
https://concourse-ci.org[concourse-ci]::
A powerful CI-service which I like to use to automate all kind of workloads.
https://github.com/pinterest/snappass[snappass]::
A secure and reliable tool for sharing passwords.
=== Configured tags
The playbook has a couple of tags configured to restrict the execution scope.
You can restrict playbook scope to specific areas using `--tags`.
.General tags
[horizontal]
`init`:: Full initial setup
`add-server`:: Add a new k3s server node
`add-agent`:: Add a new k3s agent node
`update`:: Upgrade Kubernetes or system packages
`longhorn-compatible`:: Ensure longhorn-compatibility
`longhorn`:: Deploy longhorn
`config`:: Update local kubeconfig
`k8s`:: Deploy foundational services
.Service-specific tags
[horizontal]
`cert-manager`:: Apply changes to the cert-manager including support for `Let's Encrypt`
`gitea`:: Apply changes to gitea
`concourse`:: Apply changes to concourse
`snappass`:: Apply changes to snappass
== Scaling the Cluster
Increase::
--
. Adjust the number of servers/agents in `config.auto.tfvars`
. Then rerun the Ansible playbook
--
Decrease::
--
DO NOT reduce the agent count directly.
1. Open `k9s`
2. Navigate to `:nodes`
3. Select the agent with the highest numeric index
4. Drain it with kbd:[r]
5. After draining, delete it with kbd:[Ctrl + d]
6. Now decrease the agent count in `config.auto.tfvars` and run `tofu apply`
--
== Responsibilities
OpenTofu::
* Provision machines for Kubernetes servers (public subnet)
* Provision machines for Kubernetes agents (private subnet)
* Create networking (public/private subnets + routing)
* Manage firewall rules:
** ICMP
** Kubernetes API (`6443`)
** SSH (nonstandard port, usually `1022`)
** HTTP/HTTPS (`80`, `443`)
** Git SSH (`22`)
* Manage DNS records
Ansible::
* Configure SSH access
* Configure routing on all servers
* Install and maintain k3s
* Keep system software updated
* Install longhorn
* Deploy foundational services