Go to file

Felix Nehrke cc0e00f1af Another massive rewrite of the README

This change actually alters the readme significantly. The overall goal
was to adjust it to an easier to read document, since the previous
version had generally outgrown its initial layout. This alone should
raise a flag since it could indicate a too long document. But, I want to
make sure to understand each detail even after some time off.

This new approach is targeting this desire, and improves the overall
structure to read the document from top to bottom, as I like it.

2025-11-28 00:28:25 +01:00

modules/hetzner

Move declaration of primary IPs into kubernetes-module

2025-11-28 00:28:25 +01:00

roles

Enhance the README a lot

2025-11-28 00:28:25 +01:00

.envrc.tpl

Enhance the README a lot

2025-11-28 00:28:25 +01:00

.gitignore

Move terraform-state to b2

2025-11-28 00:28:25 +01:00

.terraform.lock.hcl

Switch from terraform to opentofu, so update some providers therefore

2025-11-28 00:28:25 +01:00

ansible.cfg

Merge infra and k3 into one directory again

2025-11-28 00:24:18 +01:00

config.auto.tfvars.tpl

Enhance the README a lot

2025-11-28 00:28:25 +01:00

config.yml

Add concourse as the foundational CI tool to k8s-cluster

2025-11-28 00:28:25 +01:00

inventory.ini.tftpl

Use port 1022 for all cluster nodes as SSH-port and fix some config-errors

2025-11-28 00:28:22 +01:00

main.tf

Add current IP automatically to whitelists for SSH and Kubernetes

2025-11-28 00:28:25 +01:00

README.adoc

Another massive rewrite of the README

2025-11-28 00:28:25 +01:00

requirements.yml

Merge infra and k3 into one directory again

2025-11-28 00:24:18 +01:00

site.yml

Enhance the README a lot

2025-11-28 00:28:25 +01:00

variables.tf

Add current IP automatically to whitelists for SSH and Kubernetes

2025-11-28 00:28:25 +01:00

vault.yml

Add concourse as the foundational CI tool to k8s-cluster

2025-11-28 00:28:25 +01:00

versions.tf

Add current IP automatically to whitelists for SSH and Kubernetes

2025-11-28 00:28:25 +01:00

README.adoc

Base Infrastructure

This project will set up a k3s Kubernetes cluster on Hetzner Cloud using OpenTofu and Ansible.

It is meant to set up my base infrastructure for the web. In particular to bootstrap required machines and networks, as well as installing a Kubernetes cluster and deploying a set of foundational services.

The system is intentionally split into two stages:

Infrastructure provisioning using OpenTofu (or Terraform)
Cluster and software installation using Ansible

The entire setup is idempotent, meaning it can be applied repeatedly and safely.

TL;DR

vim -o .envrc config.auto.tfvars # Add secrets from password manager
direnv allow
tofu init
tofu apply
until ansible -m ping all; do sleep 10; done # Wait until VMs are reachable
ansible-galaxy install -r requirements.yml
ansible-playbook site.yml

Supported Platforms

The setup works on:

Debian
Ubuntu
macOS

Required Software

Please install the following:

tofu or terraform
ansible
direnv
helm
helm-diff
python3-kubernetes (Debian/Ubuntu only)

Optional Tools

These tools improve maintenance and cluster operations:

k9s

Secrets & Local Configuration

The setup requires two files:

.envrc
config.auto.tfvars

These contain credentials, environment variables, and configuration values. Both files are stored securely in my password manager.

Templates are available in the repository: * .envrc * config.auto.tfvars are provided in the code.

After placing these files, enable them with:

direnv allow

Infrastructure Provisioning (OpenTofu)

OpenTofu provisions:

Kubernetes server and agent machines
Networking (public + private subnets)
Firewall rules
Routing between subnets
DNS records

The infrastructure is fully idempotent. You can re-run tofu apply at any time.

tofu init (1)
tofu apply (2)
until ansible -m ping all; do sleep 10; done (3)

1	Initialize modules
2	Apply infrastructure and generate `inventory.ini`
3	Wait until all VMs are reachable (may take up to 5 minutes)

Cluster & Software Installation (Ansible)

Ansible installs and maintains all cluster software, including:

Routing and SSH setup on servers
A full k3s Kubernetes cluster
Foundational cluster services

All playbooks are idempotent and can be safely re-run.

ansible-galaxy install -r requirements.yml (1)
ansible-playbook site.yml                  (2)

1	Install required Ansible collections
2	Install k3s and write kubeconfig to `~/.kube/config`

Running the playbook will overwrite ~/.kube/config. Backup your config if you manage multiple clusters.

The Kubernetes setup requires an inventory.ini file, which Tofu creates automatically. So, make sure to apply the infrastructure at least once before running Ansible.

Installed Foundational Services

cert-manager

This enables automatic issuance of TLS certificates. The certificates are issued via Let’s Encrypt with support for both the staging and production environments of it.

gitea

My personal favorite git-server.

concourse-ci

A powerful CI-service which I like to use to automate all kind of workloads.

snappass

A secure and reliable tool for sharing passwords.

Not set up yet!

Configured tags

The playbook has a couple of tags configured to restrict the execution scope.

You can restrict playbook scope to specific areas using --tags.

General tags

`init`	Full initial setup
`add-server`	Add a new k3s server node
`add-agent`	Add a new k3s agent node
`update`	Upgrade Kubernetes or system packages
`config`	Update local kubeconfig
`k8s`	Deploy foundational services

Service-specific tags

`cert-manager`	Apply changes to the cert-manager including support for `Let’s Encrypt`
`gitea`	Apply changes to gitea
`concourse`	Apply changes to concourse

Scaling the Cluster

Increase

Adjust the number of servers/agents in config.auto.tfvars
Then rerun the Ansible playbook

Decrease

DO NOT reduce the agent count directly.

Open k9s
Navigate to :nodes
Select the agent with the highest numeric index
Drain it with kbd:[r]
After draining, delete it with kbd:[Ctrl + d]
Now decrease the agent count in config.auto.tfvars and run tofu apply

Responsibilities

OpenTofu

Provision machines for Kubernetes servers (public subnet)
Provision machines for Kubernetes agents (private subnet)
Create networking (public/private subnets + routing)
Manage firewall rules:
- ICMP
- Kubernetes API (6443)
- SSH (nonstandard port, usually 1022)
- HTTP/HTTPS (80, 443)
- Git SSH (22)
Manage DNS records

Ansible

Configure SSH access
Configure routing on all servers
Install and maintain k3s
Keep system software updated
Deploy foundational services

README.adoc Unescape Escape

Base Infrastructure

TL;DR

Supported Platforms

Required Software

Optional Tools

Secrets & Local Configuration

Infrastructure Provisioning (OpenTofu)

Cluster & Software Installation (Ansible)

Installed Foundational Services

Configured tags

Scaling the Cluster

Responsibilities

README.adoc