Another massive rewrite of the README

This change actually alters the readme significantly. The overall goal
was to adjust it to an easier to read document, since the previous
version had generally outgrown its initial layout. This alone should
raise a flag since it could indicate a too long document. But, I want to
make sure to understand each detail even after some time off.

This new approach is targeting this desire, and improves the overall
structure to read the document from top to bottom, as I like it.
This commit is contained in:
2025-11-28 00:05:06 +01:00
parent 70462e1795
commit cc0e00f1af

View File

@@ -1,66 +1,92 @@
= Base Infra = Base Infrastructure
:icons: font :icons: font
:source-highlighter: rouge
This project is meant to set up my base infrastructure for the web. This project will set up a https://k3s.io/[k3s] Kubernetes cluster on https://www.hetzner.com/cloud[Hetzner Cloud] using OpenTofu and Ansible.
In particular my Kubernetes cluster as well as a base set of software (CI/CD, git-server, etc.) and access-keys.
To achieve the goal of having a working base infrastructure for the web the setup is split into 2 dedicated steps: It is meant to set up my base infrastructure for the web.
In particular to bootstrap required machines and networks,
as well as installing a Kubernetes cluster and deploying a set of foundational services.
. Create static assets like machines for Kubernetes and access-keys via https://opentofu.org/[OpenTofu] (or Terraform). The system is intentionally split into two stages:
. Install/Upgrade Kubernetes-cluster and other software via Ansible.
The infrastructure is deployed on https://www.hetzner.com/cloud[Hetzner Cloud]. 1. Infrastructure provisioning using https://opentofu.org/[OpenTofu] (or Terraform)
2. Cluster and software installation using Ansible
The entire setup is *idempotent*, meaning it can be applied repeatedly and safely.
== TL;DR == TL;DR
[source,bash] [source,bash]
---- ----
vim -o .envrc config.auto.tfvars # Get the contents from password-manager vim -o .envrc config.auto.tfvars # Add secrets from password manager
direnv allow direnv allow
tofu init tofu init
tofu apply tofu apply
until ansible -m ping all; do sleep 10; done # Wait for the machines to start until ansible -m ping all; do sleep 10; done # Wait until VMs are reachable
ansible-galaxy install -r requirements.yml ansible-galaxy install -r requirements.yml
ansible-playbook site.yml ansible-playbook site.yml
---- ----
== Required software and packages == Supported Platforms
The setup will run on Debian, Ubuntu and macOS. The setup works on:
Make sure the following software is installed: * Debian
* Ubuntu
* macOS
* `tofu` or `terraform` (from package manager) == Required Software
* `ansible` (from package manager)
* `direnv` (from package manager) Please install the following:
* `tofu` or `terraform`
* `ansible`
* `direnv`
* https://helm.sh/docs/intro/install/[`helm`] * https://helm.sh/docs/intro/install/[`helm`]
* https://github.com/databus23/helm-diff?tab=readme-ov-file#install[`helm-diff`] * https://github.com/databus23/helm-diff?tab=readme-ov-file#install[`helm-diff`]
* `python3-kubernetes` (only on Debian/Ubuntu, from package manager) * `python3-kubernetes` (Debian/Ubuntu only)
=== Optional packages === Optional Tools
These packages make maintenance easier. These tools improve maintenance and cluster operations:
. `k9s` (from package manager) * `k9s`
== Setup == Secrets & Local Configuration
Make sure `.envrc` and `config.auto.tfvars` are present. The setup requires two files:
Then run `direnv allow` in the directory to apply the `.envrc`. +
Since these files contain sensitive information they are stored outside of this project in my password-manager. * `.envrc`
* `config.auto.tfvars`
These contain credentials, environment variables, and configuration values.
Both files are stored securely in my password manager.
[TIP] [TIP]
I've provided templates for both files: Templates are available in the repository:
* https://gitea.nehrke.info/nemoinho/base-infra/src/branch/main/.envrc.tpl[`.envrc`] * https://gitea.nehrke.info/nemoinho/base-infra/src/branch/main/.envrc.tpl[`.envrc`]
* https://gitea.nehrke.info/nemoinho/base-infra/src/branch/main/config.auto.tfvars.tpl[`config.auto.tfvars`] are provided in the code. * https://gitea.nehrke.info/nemoinho/base-infra/src/branch/main/config.auto.tfvars.tpl[`config.auto.tfvars`] are provided in the code.
=== Infrastructure After placing these files, enable them with:
I use OpenTofu to provide the required infrastructure to run a Kubernetes-cluster. [source,bash]
----
direnv allow
----
== Infrastructure Provisioning (OpenTofu)
OpenTofu provisions:
* Kubernetes server and agent machines
* Networking (public + private subnets)
* Firewall rules
* Routing between subnets
* DNS records
[NOTE] [NOTE]
The infrastructure is setup completely idempotent and can be safely re-applied. The infrastructure is fully idempotent. You can re-run `tofu apply` at any time.
[source,bash] [source,bash]
---- ----
@@ -69,25 +95,43 @@ tofu apply # <2>
until ansible -m ping all; do sleep 10; done # <3> until ansible -m ping all; do sleep 10; done # <3>
---- ----
<1> Initialize the Tofu modules if necessary <1> Initialize modules
<2> Setup infrastructure and create/update inventory.ini <2> Apply infrastructure and generate `inventory.ini`
<3> Wait until all machines are fully started (This might take up to 5 minutes) <3> Wait until all VMs are reachable (may take up to 5 minutes)
=== Software == Cluster & Software Installation (Ansible)
I use Ansible to install and maintain the software of my cluster. Ansible installs and maintains all cluster software, including:
This includes the Kubernetes cluster and the foundational services in it.
* Routing and SSH setup on servers
* A full k3s Kubernetes cluster
* Foundational cluster services
[NOTE] [NOTE]
All Ansible playbooks are idempotent and can be safely re-run. All playbooks are idempotent and can be safely re-run.
For the Kubernetes cluster I use https://k3s.io/[k3s], simply because it's very easy to maintain and still provides all common Kubernetes functionality. [source,bash]
----
ansible-galaxy install -r requirements.yml # <1>
ansible-playbook site.yml # <2>
----
The foundational services are: <1> Install required Ansible collections
<2> Install k3s and write kubeconfig to `~/.kube/config`
[CAUTION]
Running the playbook will overwrite `~/.kube/config`.
Backup your config if you manage multiple clusters.
[NOTE]
The Kubernetes setup requires an `inventory.ini` file, which Tofu creates automatically.
So, make sure to apply the infrastructure at least once before running Ansible.
=== Installed Foundational Services
https://cert-manager.io/docs/installation/helm[cert-manager]:: https://cert-manager.io/docs/installation/helm[cert-manager]::
This enables automatic issuance of TLS certificates. This enables automatic issuance of TLS certificates.
The certificates are issued via https://letsencrypt.org[Let's Encrypt] and can be issued for the staging and production stage of Let's Encrypt. The certificates are issued via https://letsencrypt.org[Let's Encrypt] with support for both the staging and production environments of it.
https://about.gitea.com[gitea]:: https://about.gitea.com[gitea]::
My personal favorite git-server. My personal favorite git-server.
@@ -98,85 +142,66 @@ A powerful CI-service which I like to use to automate all kind of workloads.
https://github.com/pinterest/snappass[snappass]:: https://github.com/pinterest/snappass[snappass]::
A secure and reliable tool for sharing passwords. A secure and reliable tool for sharing passwords.
+ +
TODO: Not setup yet! WARNING: Not set up yet!
[NOTE] === Configured tags
The k3s-setup requires an `inventory.ini` which is automatically created by Tofu.
So, make sure to apply the infra at least once, before running these playbooks.
[source,bash] The playbook has a couple of tags configured to restrict the execution scope.
----
ansible-galaxy install -r requirements.yml # <1>
ansible-playbook site.yml # <2>
----
<1> Install required Ansible collections to create a k3s-cluster (can be omitted in subsequent runs) You can restrict playbook scope to specific areas using `--tags`.
<2> Install k3s and download kube-config to `~/.kube/config`
[CAUTION] .General tags
The second step will override `~/.kube/config`. [horizontal]
Backup your existing config if you manage multiple clusters! `init`:: Full initial setup
`add-server`:: Add a new k3s server node
`add-agent`:: Add a new k3s agent node
`update`:: Upgrade Kubernetes or system packages
`config`:: Update local kubeconfig
`k8s`:: Deploy foundational services
[TIP] .Service-specific tags
The affected scope of the Ansible-playbook can be limited with tags (`--tags tag1,tag2`): [horizontal]
`cert-manager`:: Apply changes to the cert-manager including support for `Let's Encrypt`
`gitea`:: Apply changes to gitea
`concourse`:: Apply changes to concourse
==== Configured tags == Scaling the Cluster
The playbook has a couple of tags configured which restrict the execution to certain tasks.
init:: Everything needed for the initial setup (same as omitting tags altogether)
add-server:: Everything needed to add a new https://docs.k3s.io/cli/server[server] to the cluster
add-agent:: Everything needed to add a new https://docs.k3s.io/cli/agent[agent] to the cluster
update:: Everything needed to update the cluster
config:: Everything needed to update the local kube-config
k8s:: Everything needed to provide the foundational services
===== app-specific tags
To allow to update specific services quickly you can use the following tags.
However, these require a functional Kubernetes cluster first.
cert-manager:: Apply changes to the cert-manager including support for `Let's Encrypt`
gitea:: Apply changes to gitea
concourse:: Apply changes to concourse
== Enlarge / Reduce size of cluster
Increase:: Increase::
-- --
. Simply adjust the number of agents/servers in your `infra/config.auto.tfvars`. . Adjust the number of servers/agents in `config.auto.tfvars`
. Then run the Ansible-playbook of k3s again . Then rerun the Ansible playbook
-- --
Decrease:: Decrease::
-- --
If you want to shrink the cluster **DO NOT** reduce the agent-amount directly! DO NOT reduce the agent count directly.
Instead proceed as the following:
. Open k9s and go to `:nodes` 1. Open `k9s`
. Select the agent with the highest numerical index and press `r` to drain it 2. Navigate to `:nodes`
. Once that succeeded delete it with `Ctrl-d` 3. Select the agent with the highest numeric index
. Finally reduce the amount of agents in Tofu and apply the change 4. Drain it with kbd:[r]
5. After draining, delete it with kbd:[Ctrl + d]
6. Now decrease the agent count in `config.auto.tfvars` and run `tofu apply`
-- --
== Responsibilities == Responsibilities
OpenTofu:: OpenTofu::
* Provide a network for the Kubernetes-cluster * Provision machines for Kubernetes servers (public subnet)
** A public subnet exposed to the internet for the Kubernetes-servers * Provision machines for Kubernetes agents (private subnet)
** A private subnet for the Kubernetes-agents * Create networking (public/private subnets + routing)
** Routing between subnets * Manage firewall rules:
* Managing firewall rules to block everything from the servers except of: ** ICMP
** ping (protocol: `icmp`) ** Kubernetes API (`6443`)
** Kubernetes API (Usually port `6443`) ** SSH (nonstandard port, usually `1022`)
** ssh (I prefer to use a non-standard port (usually port `1022`) ** HTTP/HTTPS (`80`, `443`)
** public services, e.g. http and https (port `80` and `443`) but also git-ssh (port `22`) ** Git SSH (`22`)
* Provisioning the machines for Kubernetes-servers in the public subnet * Manage DNS records
* Provisioning the machines for Kubernetes-agents in the private subnet
* Managing DNS-records
Ansible:: Ansible::
* Setup SSH-connections * Configure SSH access
* Setting up routing on all servers * Configure routing on all servers
* Installing k3s * Install and maintain k3s
* Keep the software up-to-date * Keep system software updated
* Add foundational services to the cluster * Deploy foundational services