This change actually alters the readme significantly. The overall goal was to adjust it to an easier to read document, since the previous version had generally outgrown its initial layout. This alone should raise a flag since it could indicate a too long document. But, I want to make sure to understand each detail even after some time off. This new approach is targeting this desire, and improves the overall structure to read the document from top to bottom, as I like it.
Base Infrastructure
This project will set up a k3s Kubernetes cluster on Hetzner Cloud using OpenTofu and Ansible.
It is meant to set up my base infrastructure for the web. In particular to bootstrap required machines and networks, as well as installing a Kubernetes cluster and deploying a set of foundational services.
The system is intentionally split into two stages:
-
Infrastructure provisioning using OpenTofu (or Terraform)
-
Cluster and software installation using Ansible
The entire setup is idempotent, meaning it can be applied repeatedly and safely.
TL;DR
vim -o .envrc config.auto.tfvars # Add secrets from password manager
direnv allow
tofu init
tofu apply
until ansible -m ping all; do sleep 10; done # Wait until VMs are reachable
ansible-galaxy install -r requirements.yml
ansible-playbook site.yml
Supported Platforms
The setup works on:
-
Debian
-
Ubuntu
-
macOS
Required Software
Secrets & Local Configuration
The setup requires two files:
-
.envrc -
config.auto.tfvars
These contain credentials, environment variables, and configuration values. Both files are stored securely in my password manager.
Templates are available in the repository:
* .envrc
* config.auto.tfvars are provided in the code.
|
After placing these files, enable them with:
direnv allow
Infrastructure Provisioning (OpenTofu)
OpenTofu provisions:
-
Kubernetes server and agent machines
-
Networking (public + private subnets)
-
Firewall rules
-
Routing between subnets
-
DNS records
The infrastructure is fully idempotent. You can re-run tofu apply at any time.
|
tofu init (1)
tofu apply (2)
until ansible -m ping all; do sleep 10; done (3)
| 1 | Initialize modules |
| 2 | Apply infrastructure and generate inventory.ini |
| 3 | Wait until all VMs are reachable (may take up to 5 minutes) |
Cluster & Software Installation (Ansible)
Ansible installs and maintains all cluster software, including:
-
Routing and SSH setup on servers
-
A full k3s Kubernetes cluster
-
Foundational cluster services
| All playbooks are idempotent and can be safely re-run. |
ansible-galaxy install -r requirements.yml (1)
ansible-playbook site.yml (2)
| 1 | Install required Ansible collections |
| 2 | Install k3s and write kubeconfig to ~/.kube/config |
Running the playbook will overwrite ~/.kube/config.
Backup your config if you manage multiple clusters.
|
The Kubernetes setup requires an inventory.ini file, which Tofu creates automatically.
So, make sure to apply the infrastructure at least once before running Ansible.
|
Installed Foundational Services
- cert-manager
-
This enables automatic issuance of TLS certificates. The certificates are issued via Let’s Encrypt with support for both the staging and production environments of it.
- gitea
-
My personal favorite git-server.
- concourse-ci
-
A powerful CI-service which I like to use to automate all kind of workloads.
- snappass
-
A secure and reliable tool for sharing passwords.
Not set up yet!
Configured tags
The playbook has a couple of tags configured to restrict the execution scope.
You can restrict playbook scope to specific areas using --tags.
init
|
Full initial setup |
add-server
|
Add a new k3s server node |
add-agent
|
Add a new k3s agent node |
update
|
Upgrade Kubernetes or system packages |
config
|
Update local kubeconfig |
k8s
|
Deploy foundational services |
cert-manager
|
Apply changes to the cert-manager including support for |
gitea
|
Apply changes to gitea |
concourse
|
Apply changes to concourse |
Scaling the Cluster
- Increase
-
Adjust the number of servers/agents in
config.auto.tfvars -
Then rerun the Ansible playbook
- Decrease
DO NOT reduce the agent count directly.
-
Open
k9s -
Navigate to
:nodes -
Select the agent with the highest numeric index
-
Drain it with kbd:[r]
-
After draining, delete it with kbd:[Ctrl + d]
-
Now decrease the agent count in
config.auto.tfvarsand runtofu apply
Responsibilities
- OpenTofu
-
-
Provision machines for Kubernetes servers (public subnet)
-
Provision machines for Kubernetes agents (private subnet)
-
Create networking (public/private subnets + routing)
-
Manage firewall rules:
-
ICMP
-
Kubernetes API (
6443) -
SSH (nonstandard port, usually
1022) -
HTTP/HTTPS (
80,443) -
Git SSH (
22)
-
-
Manage DNS records
-
- Ansible
-
-
Configure SSH access
-
Configure routing on all servers
-
Install and maintain k3s
-
Keep system software updated
-
Deploy foundational services
-