This change adds longhorn, an addition to Kubernetes that adds the
ability to use distributed storage over all nodes to the cluster.
Note, that I tried that in December already but due to very high load on
the machines I rolled _everything_ back. Though, it turned out that the
high load was not because of longhorn, but instead because of bad
configuration of the server, as described in the see-also commit.
Reference: https://longhorn.io/
Reference: https://longhorn.io/docs/1.10.1/deploy/install/install-with-helm/
See-also: 4b8a3d12c4 Use etcd instead of sqlite for k3s-server
241 lines
6.5 KiB
Plaintext
241 lines
6.5 KiB
Plaintext
= Base Infrastructure
|
|
:icons: font
|
|
:source-highlighter: rouge
|
|
:toc: preamble
|
|
ifdef::env-github[]
|
|
:tip-caption: :bulb:
|
|
:note-caption: :point_up:
|
|
:important-caption: :heavy_exclamation_mark:
|
|
:caution-caption: :fire:
|
|
:warning-caption: :warning:
|
|
endif::[]
|
|
|
|
ifdef::env-github[]
|
|
NOTE: This is only *a mirror* from my https://gitea.nehrke.info/nemoinho/base-infra[personal git-server].
|
|
endif::[]
|
|
|
|
This project will set up a https://k3s.io/[k3s] Kubernetes cluster on https://www.hetzner.com/cloud[Hetzner Cloud] using OpenTofu and Ansible.
|
|
|
|
It is meant to set up my base infrastructure for the web.
|
|
In particular to bootstrap required machines and networks,
|
|
as well as installing a Kubernetes cluster and deploying a set of foundational services.
|
|
|
|
The system is intentionally split into two stages:
|
|
|
|
1. Infrastructure provisioning using https://opentofu.org/[OpenTofu] (or Terraform)
|
|
2. Cluster and software installation using Ansible
|
|
|
|
The entire setup is *idempotent*, meaning it can be applied repeatedly and safely.
|
|
|
|
== TL;DR
|
|
|
|
[source,bash]
|
|
----
|
|
vim -o .envrc config.auto.tfvars # Add secrets from password manager
|
|
direnv allow
|
|
tofu init
|
|
tofu apply
|
|
until ansible -m ping all; do sleep 10; done # Wait until VMs are reachable
|
|
ansible-galaxy install -r requirements.yml
|
|
ansible-playbook site.yml
|
|
----
|
|
|
|
== Required Software
|
|
|
|
The setup works on:
|
|
|
|
* Debian
|
|
* Ubuntu
|
|
* macOS
|
|
|
|
Please install the following:
|
|
|
|
* `tofu` or `terraform`
|
|
* `ansible`
|
|
* `direnv`
|
|
* https://helm.sh/docs/intro/install/[`helm`]
|
|
* https://github.com/databus23/helm-diff?tab=readme-ov-file#install[`helm-diff`]
|
|
* `python3-kubernetes` (Debian/Ubuntu only)
|
|
|
|
=== Optional Tools
|
|
|
|
These tools improve maintenance and cluster operations:
|
|
|
|
* `k9s`
|
|
|
|
== Secrets & Local Configuration
|
|
|
|
The setup requires two files:
|
|
|
|
* `.envrc`
|
|
* `config.auto.tfvars`
|
|
|
|
These contain credentials, environment variables, and configuration values.
|
|
Both files are stored securely in my password manager.
|
|
|
|
[TIP]
|
|
--
|
|
Templates are available in the repository:
|
|
|
|
* https://gitea.nehrke.info/nemoinho/base-infra/src/branch/main/.envrc.tpl[`.envrc`]
|
|
* https://gitea.nehrke.info/nemoinho/base-infra/src/branch/main/config.auto.tfvars.tpl[`config.auto.tfvars`]
|
|
--
|
|
|
|
After placing these files, enable them with:
|
|
|
|
[source,bash]
|
|
----
|
|
direnv allow
|
|
----
|
|
|
|
== Infrastructure Provisioning (OpenTofu)
|
|
|
|
OpenTofu provisions:
|
|
|
|
* Kubernetes server and agent machines
|
|
* Networking (public + private subnets)
|
|
* Firewall rules
|
|
* Routing between subnets
|
|
* DNS records
|
|
|
|
[NOTE]
|
|
The infrastructure is fully idempotent. You can re-run `tofu apply` at any time.
|
|
|
|
[source,bash]
|
|
----
|
|
tofu init # <1>
|
|
tofu apply # <2>
|
|
until ansible -m ping all; do sleep 10; done # <3>
|
|
----
|
|
|
|
<1> Initialize modules
|
|
<2> Apply infrastructure and generate `inventory.ini`
|
|
<3> Wait until all VMs are reachable (may take up to 5 minutes)
|
|
|
|
== Cluster & Software Installation (Ansible)
|
|
|
|
Ansible installs and maintains all cluster software, including:
|
|
|
|
* Routing and SSH setup on servers
|
|
* A full k3s Kubernetes cluster
|
|
* Distributed block-storage via https://longhorn.io/[longhorn]
|
|
* Foundational cluster services
|
|
|
|
[NOTE]
|
|
All playbooks are idempotent and can be safely re-run.
|
|
|
|
[source,bash]
|
|
----
|
|
ansible-galaxy install -r requirements.yml # <1>
|
|
ansible-playbook site.yml # <2>
|
|
----
|
|
|
|
<1> Install required Ansible collections
|
|
<2> Install k3s and write kubeconfig to `~/.kube/config`
|
|
|
|
[CAUTION]
|
|
Running the playbook will overwrite `~/.kube/config`.
|
|
Backup your config if you manage multiple clusters.
|
|
|
|
[NOTE]
|
|
The Kubernetes setup requires an `inventory.ini` file, which Tofu creates automatically.
|
|
So, make sure to apply the infrastructure at least once before running Ansible.
|
|
|
|
=== Longhorn
|
|
|
|
The setup installs https://longhorn.io/[Longhorn], which provides a distributed block-storage system for the Kubernetes cluster.
|
|
|
|
Longhorn exposes a default storage class named `longhorn`.
|
|
This storage class is backed by replicated volumes distributed across multiple nodes,
|
|
reducing dependency on node-local ephemeral storage and allowing workloads to be rescheduled more reliably.
|
|
|
|
Longhorn also provides a web-based dashboard for inspecting volumes, replicas, and node health.
|
|
|
|
To access the dashboard, forward the service port:
|
|
|
|
[source,bash]
|
|
kubectl port-forward -n longhorn-system --address 0.0.0.0 service/longhorn-frontend 8000:80
|
|
|
|
Then open http://localhost:8000/ in your browser.
|
|
|
|
=== Installed Foundational Services
|
|
|
|
https://cert-manager.io/docs/installation/helm[cert-manager]::
|
|
This enables automatic issuance of TLS certificates.
|
|
The certificates are issued via https://letsencrypt.org[Let's Encrypt] with support for both the staging and production environments of it.
|
|
|
|
https://about.gitea.com[gitea]::
|
|
My personal favorite git-server.
|
|
|
|
https://concourse-ci.org[concourse-ci]::
|
|
A powerful CI-service which I like to use to automate all kind of workloads.
|
|
|
|
https://github.com/pinterest/snappass[snappass]::
|
|
A secure and reliable tool for sharing passwords.
|
|
|
|
=== Configured tags
|
|
|
|
The playbook has a couple of tags configured to restrict the execution scope.
|
|
|
|
You can restrict playbook scope to specific areas using `--tags`.
|
|
|
|
.General tags
|
|
[horizontal]
|
|
`init`:: Full initial setup
|
|
`add-server`:: Add a new k3s server node
|
|
`add-agent`:: Add a new k3s agent node
|
|
`update`:: Upgrade Kubernetes or system packages
|
|
`longhorn-compatible`:: Ensure longhorn-compatibility
|
|
`longhorn`:: Deploy longhorn
|
|
`config`:: Update local kubeconfig
|
|
`k8s`:: Deploy foundational services
|
|
|
|
.Service-specific tags
|
|
[horizontal]
|
|
`cert-manager`:: Apply changes to the cert-manager including support for `Let's Encrypt`
|
|
`gitea`:: Apply changes to gitea
|
|
`concourse`:: Apply changes to concourse
|
|
`snappass`:: Apply changes to snappass
|
|
|
|
== Scaling the Cluster
|
|
|
|
Increase::
|
|
--
|
|
. Adjust the number of servers/agents in `config.auto.tfvars`
|
|
. Then rerun the Ansible playbook
|
|
--
|
|
|
|
Decrease::
|
|
--
|
|
DO NOT reduce the agent count directly.
|
|
|
|
1. Open `k9s`
|
|
2. Navigate to `:nodes`
|
|
3. Select the agent with the highest numeric index
|
|
4. Drain it with kbd:[r]
|
|
5. After draining, delete it with kbd:[Ctrl + d]
|
|
6. Now decrease the agent count in `config.auto.tfvars` and run `tofu apply`
|
|
--
|
|
|
|
== Responsibilities
|
|
|
|
OpenTofu::
|
|
* Provision machines for Kubernetes servers (public subnet)
|
|
* Provision machines for Kubernetes agents (private subnet)
|
|
* Create networking (public/private subnets + routing)
|
|
* Manage firewall rules:
|
|
** ICMP
|
|
** Kubernetes API (`6443`)
|
|
** SSH (nonstandard port, usually `1022`)
|
|
** HTTP/HTTPS (`80`, `443`)
|
|
** Git SSH (`22`)
|
|
* Manage DNS records
|
|
|
|
Ansible::
|
|
* Configure SSH access
|
|
* Configure routing on all servers
|
|
* Install and maintain k3s
|
|
* Keep system software updated
|
|
* Install longhorn
|
|
* Deploy foundational services
|