Another massive rewrite of the README

This change actually alters the readme significantly. The overall goal was to adjust it to an easier to read document, since the previous version had generally outgrown its initial layout. This alone should raise a flag since it could indicate a too long document. But, I want to make sure to understand each detail even after some time off. This new approach is targeting this desire, and improves the overall structure to read the document from top to bottom, as I like it.
2025-11-28 00:05:06 +01:00
parent 70462e1795
commit cc0e00f1af
1 changed files with 126 additions and 101 deletions
--- a/README.adoc
+++ b/README.adoc
@@ -1,66 +1,92 @@
-= Base Infra
+= Base Infrastructure
 :icons: font
 :source-highlighter: rouge
-This project is meant to set up my base infrastructure for the web.
+This project will set up a https://k3s.io/[k3s] Kubernetes cluster on https://www.hetzner.com/cloud[Hetzner Cloud] using OpenTofu and Ansible.
 In particular my Kubernetes cluster as well as a base set of software (CI/CD, git-server, etc.) and access-keys.
-To achieve the goal of having a working base infrastructure for the web the setup is split into 2 dedicated steps:
+It is meant to set up my base infrastructure for the web.
 In particular to bootstrap required machines and networks,
 as well as installing a Kubernetes cluster and deploying a set of foundational services.
-. Create static assets like machines for Kubernetes and access-keys via https://opentofu.org/[OpenTofu] (or Terraform).
+The system is intentionally split into two stages:
 . Install/Upgrade Kubernetes-cluster and other software via Ansible.
-The infrastructure is deployed on https://www.hetzner.com/cloud[Hetzner Cloud].
+1. Infrastructure provisioning using https://opentofu.org/[OpenTofu] (or Terraform)
 2. Cluster and software installation using Ansible
 The entire setup is *idempotent*, meaning it can be applied repeatedly and safely.
 == TL;DR
 [source,bash]
 ----
-vim -o .envrc config.auto.tfvars # Get the contents from password-manager
+vim -o .envrc config.auto.tfvars # Add secrets from password manager
 direnv allow
 tofu init
 tofu apply
-until ansible -m ping all; do sleep 10; done # Wait for the machines to start
+until ansible -m ping all; do sleep 10; done # Wait until VMs are reachable
 ansible-galaxy install -r requirements.yml
 ansible-playbook site.yml
 ----
-== Required software and packages
+== Supported Platforms
-The setup will run on Debian, Ubuntu and macOS.
+The setup works on:
-Make sure the following software is installed:
+* Debian
 * Ubuntu
 * macOS
-* `tofu` or `terraform` (from package manager)
+== Required Software
-* `ansible` (from package manager)
+
-* `direnv` (from package manager)
+Please install the following:
 * `tofu` or `terraform`
 * `ansible`
 * `direnv`
 * https://helm.sh/docs/intro/install/[`helm`] 
 * https://github.com/databus23/helm-diff?tab=readme-ov-file#install[`helm-diff`]
-* `python3-kubernetes` (only on Debian/Ubuntu, from package manager)
+* `python3-kubernetes` (Debian/Ubuntu only)
-=== Optional packages
+=== Optional Tools
-These packages make maintenance easier.
+These tools improve maintenance and cluster operations:
-. `k9s` (from package manager)
+* `k9s`
-== Setup
+== Secrets & Local Configuration
-Make sure `.envrc` and `config.auto.tfvars` are present.
+The setup requires two files:
 Then run `direnv allow` in the directory to apply the `.envrc`. +
-Since these files contain sensitive information they are stored outside of this project in my password-manager.
+* `.envrc`
 * `config.auto.tfvars`
 These contain credentials, environment variables, and configuration values.
 Both files are stored securely in my password manager.
 [TIP]
-I've provided templates for both files:
+Templates are available in the repository:
 * https://gitea.nehrke.info/nemoinho/base-infra/src/branch/main/.envrc.tpl[`.envrc`]
 * https://gitea.nehrke.info/nemoinho/base-infra/src/branch/main/config.auto.tfvars.tpl[`config.auto.tfvars`] are provided in the code.
-=== Infrastructure
+After placing these files, enable them with:
-I use OpenTofu to provide the required infrastructure to run a Kubernetes-cluster.
+[source,bash]
 ----
 direnv allow
 ----
 == Infrastructure Provisioning (OpenTofu)
 OpenTofu provisions:
 * Kubernetes server and agent machines
 * Networking (public + private subnets)
 * Firewall rules
 * Routing between subnets
 * DNS records
 [NOTE]
-The infrastructure is setup completely idempotent and can be safely re-applied.
+The infrastructure is fully idempotent. You can re-run `tofu apply` at any time.
 [source,bash]
 ----
@@ -69,25 +95,43 @@ tofu apply # <2>
 until ansible -m ping all; do sleep 10; done # <3>
 ----
-<1> Initialize the Tofu modules if necessary
+<1> Initialize modules
-<2> Setup infrastructure and create/update inventory.ini
+<2> Apply infrastructure and generate `inventory.ini`
-<3> Wait until all machines are fully started (This might take up to 5 minutes)
+<3> Wait until all VMs are reachable (may take up to 5 minutes)
-=== Software
+== Cluster & Software Installation (Ansible)
-I use Ansible to install and maintain the software of my cluster.
+Ansible installs and maintains all cluster software, including:
-This includes the Kubernetes cluster and the foundational services in it.
+
 * Routing and SSH setup on servers
 * A full k3s Kubernetes cluster
 * Foundational cluster services
 [NOTE]
-All Ansible playbooks are idempotent and can be safely re-run.
+All playbooks are idempotent and can be safely re-run.
-For the Kubernetes cluster I use https://k3s.io/[k3s], simply because it's very easy to maintain and still provides all common Kubernetes functionality.
+[source,bash]
 ----
 ansible-galaxy install -r requirements.yml # <1>
 ansible-playbook site.yml                  # <2>
 ----
-The foundational services are:
+<1> Install required Ansible collections  
 <2> Install k3s and write kubeconfig to `~/.kube/config`
 [CAUTION]
 Running the playbook will overwrite `~/.kube/config`.
 Backup your config if you manage multiple clusters.
 [NOTE]
 The Kubernetes setup requires an `inventory.ini` file, which Tofu creates automatically.
 So, make sure to apply the infrastructure at least once before running Ansible.
 === Installed Foundational Services
 https://cert-manager.io/docs/installation/helm[cert-manager]::
 This enables automatic issuance of TLS certificates.
-The certificates are issued via https://letsencrypt.org[Let's Encrypt] and can be issued for the staging and production stage of Let's Encrypt.
+The certificates are issued via https://letsencrypt.org[Let's Encrypt] with support for both the staging and production environments of it.
 https://about.gitea.com[gitea]::
 My personal favorite git-server.
@@ -98,85 +142,66 @@ A powerful CI-service which I like to use to automate all kind of workloads.
 https://github.com/pinterest/snappass[snappass]::
 A secure and reliable tool for sharing passwords.
 +
-TODO: Not setup yet!
+WARNING: Not set up yet!
-[NOTE]
+=== Configured tags
 The k3s-setup requires an `inventory.ini` which is automatically created by Tofu.
 So, make sure to apply the infra at least once, before running these playbooks.
-[source,bash]
+The playbook has a couple of tags configured to restrict the execution scope.
 ----
 ansible-galaxy install -r requirements.yml # <1>
 ansible-playbook site.yml # <2>
 ----
-<1> Install required Ansible collections to create a k3s-cluster (can be omitted in subsequent runs)
+You can restrict playbook scope to specific areas using `--tags`.
 <2> Install k3s and download kube-config to `~/.kube/config`
-[CAUTION]
+.General tags
-The second step will override `~/.kube/config`.
+[horizontal]
-Backup your existing config if you manage multiple clusters!
+`init`:: Full initial setup
 `add-server`:: Add a new k3s server node
 `add-agent`:: Add a new k3s agent node
 `update`:: Upgrade Kubernetes or system packages
 `config`:: Update local kubeconfig
 `k8s`:: Deploy foundational services
-[TIP]
+.Service-specific tags
-The affected scope of the Ansible-playbook can be limited with tags (`--tags tag1,tag2`):
+[horizontal]
 `cert-manager`:: Apply changes to the cert-manager including support for `Let's Encrypt`
 `gitea`:: Apply changes to gitea
 `concourse`:: Apply changes to concourse
-==== Configured tags
+== Scaling the Cluster
 The playbook has a couple of tags configured which restrict the execution to certain tasks.
 init:: Everything needed for the initial setup (same as omitting tags altogether)
 add-server:: Everything needed to add a new https://docs.k3s.io/cli/server[server] to the cluster
 add-agent:: Everything needed to add a new https://docs.k3s.io/cli/agent[agent] to the cluster
 update:: Everything needed to update the cluster
 config:: Everything needed to update the local kube-config
 k8s:: Everything needed to provide the foundational services
 ===== app-specific tags
 To allow to update specific services quickly you can use the following tags.
 However, these require a functional Kubernetes cluster first.
 cert-manager:: Apply changes to the cert-manager including support for `Let's Encrypt`
 gitea:: Apply changes to gitea
 concourse:: Apply changes to concourse
 == Enlarge / Reduce size of cluster
 Increase::
 --
-. Simply adjust the number of agents/servers in your `infra/config.auto.tfvars`.
+. Adjust the number of servers/agents in `config.auto.tfvars`
-. Then run the Ansible-playbook of k3s again
+. Then rerun the Ansible playbook
 --
 Decrease::
 --
-If you want to shrink the cluster **DO NOT** reduce the agent-amount directly!
+DO NOT reduce the agent count directly.
 Instead proceed as the following:
-. Open k9s and go to `:nodes`
+1. Open `k9s`
-. Select the agent with the highest numerical index and press `r` to drain it
+2. Navigate to `:nodes`
-. Once that succeeded delete it with `Ctrl-d`
+3. Select the agent with the highest numeric index
-. Finally reduce the amount of agents in Tofu and apply the change
+4. Drain it with kbd:[r]
 5. After draining, delete it with kbd:[Ctrl + d]
 6. Now decrease the agent count in `config.auto.tfvars` and run `tofu apply`
 --
 == Responsibilities
 OpenTofu::
-* Provide a network for the Kubernetes-cluster
+* Provision machines for Kubernetes servers (public subnet)
-** A public subnet exposed to the internet for the Kubernetes-servers
+* Provision machines for Kubernetes agents (private subnet)
-** A private subnet for the Kubernetes-agents
+* Create networking (public/private subnets + routing)
-** Routing between subnets
+* Manage firewall rules:
-* Managing firewall rules to block everything from the servers except of:
+** ICMP
-** ping (protocol: `icmp`)
+** Kubernetes API (`6443`)
-** Kubernetes API (Usually port `6443`)
+** SSH (nonstandard port, usually `1022`)
-** ssh (I prefer to use a non-standard port (usually port `1022`)
+** HTTP/HTTPS (`80`, `443`)
-** public services, e.g. http and https (port `80` and `443`) but also git-ssh (port `22`)
+** Git SSH (`22`)
-* Provisioning the machines for Kubernetes-servers in the public subnet
+* Manage DNS records
 * Provisioning the machines for Kubernetes-agents in the private subnet
 * Managing DNS-records
 Ansible::
-* Setup SSH-connections
+* Configure SSH access
-* Setting up routing on all servers
+* Configure routing on all servers
-* Installing k3s
+* Install and maintain k3s
-* Keep the software up-to-date
+* Keep system software updated
-* Add foundational services to the cluster
+* Deploy foundational services