Hetzner has changed it's API and removed the field `datacenter` from the
primary IPs in favor of `location`. This change reflex this and adjusts
the configuration accordingly. Note, that this change didn't require any
manual state changes. Instead I applied the former plan with the newest
provider once. Hence the provider already treated the fields correctly I
only had to adjust the configuration.
Chapeau Hetzner for this good transition!
See-also: 14da745f Update tofu-resources to their latest versions
Reference: https://docs.hetzner.cloud/changelog#2025-12-16-phasing-out-datacenters
The API of Hetzner has got some important changes recently which will
impact my configuration. So, this maintenance change is necessary for me
to address these changes and figure out all deprecations.
First and foremost a new DNS-API was introduced in November 2025 to bind
the DNS-settings closer to their cloud console. In favor of this new
DNS-system they will phase out the old API in the beginning of May 2026!
Secondly, some API-fields have changed, e.g. the "datacenter" field of
primary IPs is going to be removed in favor of the "location" field.
This change will finally take place at 1. of July 2026.
Beside of that I simply updated all providers to their latest versions.
Reference: https://docs.hetzner.com/networking/dns/faq/beta
Reference: https://docs.hetzner.cloud/changelog#2025-11
Reference: https://docs.hetzner.cloud/changelog#2025-12
This change adds longhorn, an addition to Kubernetes that adds the
ability to use distributed storage over all nodes to the cluster.
Note, that I tried that in December already but due to very high load on
the machines I rolled _everything_ back. Though, it turned out that the
high load was not because of longhorn, but instead because of bad
configuration of the server, as described in the see-also commit.
Reference: https://longhorn.io/
Reference: https://longhorn.io/docs/1.10.1/deploy/install/install-with-helm/
See-also: 4b8a3d12c4 Use etcd instead of sqlite for k3s-server
Oh damn, that was so annoying. My cluster ran on near 100% load all the
time! As it turns out that's a known issue on k3s clusters.
The solution is to add the `--cluster-init` flag to the server which let
the server use etcd instead of sqlite. And voila the cpu-usage drops to
a resonably low level in the single digit percent range.
Reference: https://github.com/k3s-io/k3s/issues/10396
Reference: https://docs.k3s.io/datastore/ha-embedded#existing-single-node-clusters
Bitnami has discontinued a lot of their container images. Alongside
these were also their images for a high availability setup of
postgresql. This change fixes that by referencing the legacy bitnami
images until a "new" approach is found.
My gitea-server is basically my safe harbor for private git-projects. It
is not meant to be public.
Even more important that would shift responsibilities a lot, especially
legal liabilities may become important suddenly, when the server is
open.
Furthermore I can't guarantee a process availability when I cannot make
any assumptions about the usage. And, I cannot make such assumptions for
an open and public project which I maintain in my spare-time.
This change is surprisingly tricky and needed some temporary
workarounds. First, there is no "official" snappass helm chart but I
found one, which does the job and looked good enough. The other problem
is the missing "official" image of snappass. The helm-chart used a
customized image which I didn't want to use, therefore I had to rebuild
a brand new image quickly. This new image is unfortunately not bound to
any repository or pipeline yet, which means that this change needs some
trust for the moment until I've set up the needed repo and CI
structures.
Reference: https://github.com/lmacka/helm-snappass/tree/main
Reference: https://github.com/pinterest/snappass
For stupidity reasons I had split up the "Supported Platforms" and the
"Required Software" without realising that these are actually entangled.
This change fixes that.
This change actually alters the readme significantly. The overall goal
was to adjust it to an easier to read document, since the previous
version had generally outgrown its initial layout. This alone should
raise a flag since it could indicate a too long document. But, I want to
make sure to understand each detail even after some time off.
This new approach is targeting this desire, and improves the overall
structure to read the document from top to bottom, as I like it.
This change is huge, therefore I only sum up the most important changes:
* Improve spelling
* Reduce ambiguity
* Use OpenTofu instead of Terraform
* Document missing tags for Ansible
* Provide example-configuration
* Fix confusion between dotenv and direnv, I use direnv!
* Add section about required software
* Many spelling mistakes
After I removed the automatic IP addition to the firewalls for SSH and
Kubernetes I ran into a problem only a few days later. My ISP changed
my IPs and I was to stupid to realize that immediately. So, this change
reintroduces the automatic addition of my current IPs to the whitelists
for Kubernetes and SSH. Though, I adjusted the algorithm, so it will not
change every day or so, but instead really only when my ISP changes my
IPs.
I renamed the project from "hetzner-infra" to "base-infra", since that
better fits the purpose of this repository. So, this change migrates the
state name accordingly, to avoid confusion.
I plan to move over more base tasks to this repository, like maintaining
the keys for Backblaze. Therefore I adjusted the readme accordingly.
Furthermore I fixed the spelling on sever places.
The definition was split into multiple settings, that made it
unnecessary complicated to setup the definition for my kubernetes
cluster. This new approach allows for granular definitions of servers
and agents and is also simpler to use for me.
I liked the idea to have these IPs dynamically detected at runtime,
though some research showed that my current provider only renews these
every 180 days, nowadays. So, no need for such a hyper-dynamic solution.
Instead I use a variable now, which brings some other benefits, like
adding arbitrary IPs as well. This might become handy in cases of CI/CD.
This change makes it a bit easier for me to manage specific domains.
Note, that in the long-run these settings should _not_ belong to this
repository. Instead I'm going to maintain these in projects where the
domain is more meaningful.
By applying this change the kubernetes cluster gets a gitea-server
setup. Note, that I use a custom-image which I have to automate in
future. The customization is necessary since I use asciidoc very often
and the default-gitea doesn't render these files, so it becomes a bit
cumbersome to read them on the web.
The terraform-state can be stored in backblaze b2 with some
configurations. This changes does exactly this. Note, that this requires
the special env-variables `AWS_SECRET_ACCESS_KEY` and
`AWS_ACCESS_KEY_ID` which are normally part of the AWS-setup. To be able
to use AWS and this setup in parallel I use dotenv to maintain the
variables in the special file `.envrc`.
Reference: https://andrzejgor.ski/posts/backblaze_b2_tf_state
Reference: https://www.reddit.com/r/selfhosted/comments/1iv1qir
Reference: https://direnv.net/
I'm oversaw completely, that I have to change the SSH-port for all nodes
in the cluster otherwise I cannot provide a meaningful load-balancer for
the git-ssh port in it.
Additionally this allowed me to fix some config errors which I simply
oversaw.
The previous setting of tags still let ansible gather facts for the
roles in question, even though they're not executed. This fix prevent
this from happening.
The playbook itself is written to be idempotent, so it doesn't hurt to
run all tasks many times. But, it doesn't need to run all tasks all the
time, therefore you can limit the executional-scope with the documented
tags to only affect certain tasks. This improves the performance a lot!
Since I don't have multiple terraform steps anymore it simply doesn't
make sense to me anymore to split all tasks into separate folders.
Instead I try to be as clear as possible in the README to make it easy
to follow the structure in the future without too much headache.
It simply doesn't make sense to split the installation of the
kubernetes-cluster from the provisioning of foundational services.
Therefore I drop the idea to organise these services in another
terraform-setup and instead ensure their presence with ansible, as it's
already responsible for setting up the cluster and keep it up-to.date.
It looks somehow random that the SSH-port was simply defined in the
configuration of the k3s-setup. It looks somehow "configurable" although
it isn't. Therefore I moved this setting to the correct place in the
terraform-setup.
An important side-note is that this change doesn't make it possible to
_change_ the ssh-port, though. Once decided for an port and I have to
stick to it until I tear down the cluster!
The navigation through a bunch of config files, all with just a few
lines in it is cumbersome. This change moves all the configuration into
a centralized `config.ini` that way it's easier for me to get a quick
overview of the setup. The `config.ini` acts as another inventory and is
therefore references as such by the ansible.cfg. The `inventory.ini`
(which is generated by terraform in the provisioning-step) is not
affected by this change.
With this change we no longer use user-data scripts on the provided
machines. That makes it way easier for me to handle all the
configuration, since I only have to run ansible. Furthermore this the
burdon to think what may went wrong, since ansible is easier to debug
than some arbitrary scripts which run at provisioning-time on the
machines.
With this change I should also think about restructuring the code a bit
as well. Since it's actually easier to provide the initial
software-stack for the cluster via ansible than via terraform, at least
as far as I can tell right now.
The only reason I even change the port is to make sure a git-client can
reach the my upcoming git-servers on the standard ssh-port. Though to
achive this I only have to make sure that the port is reacheable on the
internet, after that the port is routed through the kubernetes network.
This means that my agents can keep using the standard-port, which makes
everything easier for me :)