= Base Infrastructure :icons: font :source-highlighter: rouge :toc: preamble ifdef::env-github[] :tip-caption: :bulb: :note-caption: :point_up: :important-caption: :heavy_exclamation_mark: :caution-caption: :fire: :warning-caption: :warning: endif::[] ifdef::env-github[] NOTE: This is only *a mirror* from my https://gitea.nehrke.info/nemoinho/base-infra[personal git-server]. endif::[] This project will set up a https://k3s.io/[k3s] Kubernetes cluster on https://www.hetzner.com/cloud[Hetzner Cloud] using OpenTofu and Ansible. It is meant to set up my base infrastructure for the web. In particular to bootstrap required machines and networks, as well as installing a Kubernetes cluster and deploying a set of foundational services. The system is intentionally split into two stages: 1. Infrastructure provisioning using https://opentofu.org/[OpenTofu] (or Terraform) 2. Cluster and software installation using Ansible The entire setup is *idempotent*, meaning it can be applied repeatedly and safely. == TL;DR [source,bash] ---- vim -o .envrc config.auto.tfvars # Add secrets from password manager direnv allow tofu init tofu apply until ansible -m ping all; do sleep 10; done # Wait until VMs are reachable ansible-galaxy install -r requirements.yml ansible-playbook site.yml ---- == Required Software The setup works on: * Debian * Ubuntu * macOS Please install the following: * `tofu` or `terraform` * `ansible` * `direnv` * https://helm.sh/docs/intro/install/[`helm`] * https://github.com/databus23/helm-diff?tab=readme-ov-file#install[`helm-diff`] * `python3-kubernetes` (Debian/Ubuntu only) === Optional Tools These tools improve maintenance and cluster operations: * `k9s` == Secrets & Local Configuration The setup requires two files: * `.envrc` * `config.auto.tfvars` These contain credentials, environment variables, and configuration values. Both files are stored securely in my password manager. [TIP] -- Templates are available in the repository: * https://gitea.nehrke.info/nemoinho/base-infra/src/branch/main/.envrc.tpl[`.envrc`] * https://gitea.nehrke.info/nemoinho/base-infra/src/branch/main/config.auto.tfvars.tpl[`config.auto.tfvars`] -- After placing these files, enable them with: [source,bash] ---- direnv allow ---- == Infrastructure Provisioning (OpenTofu) OpenTofu provisions: * Kubernetes server and agent machines * Networking (public + private subnets) * Firewall rules * Routing between subnets * DNS records [NOTE] The infrastructure is fully idempotent. You can re-run `tofu apply` at any time. [source,bash] ---- tofu init # <1> tofu apply # <2> until ansible -m ping all; do sleep 10; done # <3> ---- <1> Initialize modules <2> Apply infrastructure and generate `inventory.ini` <3> Wait until all VMs are reachable (may take up to 5 minutes) == Cluster & Software Installation (Ansible) Ansible installs and maintains all cluster software, including: * Routing and SSH setup on servers * A full k3s Kubernetes cluster * Foundational cluster services [NOTE] All playbooks are idempotent and can be safely re-run. [source,bash] ---- ansible-galaxy install -r requirements.yml # <1> ansible-playbook site.yml # <2> ---- <1> Install required Ansible collections <2> Install k3s and write kubeconfig to `~/.kube/config` [CAUTION] Running the playbook will overwrite `~/.kube/config`. Backup your config if you manage multiple clusters. [NOTE] The Kubernetes setup requires an `inventory.ini` file, which Tofu creates automatically. So, make sure to apply the infrastructure at least once before running Ansible. === Installed Foundational Services https://cert-manager.io/docs/installation/helm[cert-manager]:: This enables automatic issuance of TLS certificates. The certificates are issued via https://letsencrypt.org[Let's Encrypt] with support for both the staging and production environments of it. https://about.gitea.com[gitea]:: My personal favorite git-server. https://concourse-ci.org[concourse-ci]:: A powerful CI-service which I like to use to automate all kind of workloads. https://github.com/pinterest/snappass[snappass]:: A secure and reliable tool for sharing passwords. === Configured tags The playbook has a couple of tags configured to restrict the execution scope. You can restrict playbook scope to specific areas using `--tags`. .General tags [horizontal] `init`:: Full initial setup `add-server`:: Add a new k3s server node `add-agent`:: Add a new k3s agent node `update`:: Upgrade Kubernetes or system packages `config`:: Update local kubeconfig `k8s`:: Deploy foundational services .Service-specific tags [horizontal] `cert-manager`:: Apply changes to the cert-manager including support for `Let's Encrypt` `gitea`:: Apply changes to gitea `concourse`:: Apply changes to concourse `snappass`:: Apply changes to snappass == Scaling the Cluster Increase:: -- . Adjust the number of servers/agents in `config.auto.tfvars` . Then rerun the Ansible playbook -- Decrease:: -- DO NOT reduce the agent count directly. 1. Open `k9s` 2. Navigate to `:nodes` 3. Select the agent with the highest numeric index 4. Drain it with kbd:[r] 5. After draining, delete it with kbd:[Ctrl + d] 6. Now decrease the agent count in `config.auto.tfvars` and run `tofu apply` -- == Responsibilities OpenTofu:: * Provision machines for Kubernetes servers (public subnet) * Provision machines for Kubernetes agents (private subnet) * Create networking (public/private subnets + routing) * Manage firewall rules: ** ICMP ** Kubernetes API (`6443`) ** SSH (nonstandard port, usually `1022`) ** HTTP/HTTPS (`80`, `443`) ** Git SSH (`22`) * Manage DNS records Ansible:: * Configure SSH access * Configure routing on all servers * Install and maintain k3s * Keep system software updated * Deploy foundational services