Lmst

Time to abandon the Broadcom-owned "Bitnami" Helm chart and pgSQL images for CloudNative-PG to the CNCF's own charts and images.

Initial deployment in the test cluster seems positive, but now I want to test if I can remove the operator and CRDs without destroying the running cluster. That will be key before I try this on the prod clusters.

#AdventuresInSelfHosting

I started playing with Kerberos and thus AD, simply because I wanted to be able to mount my nextcloud NFS volume on the server without having it show up as owned "_apt:82" on the NAS. But this has been an extremely frustrating #AdventuresInSelfHosting

- I need to get kdc, openldap and idmap to all work together.
- I figured samba would make that easier.
- Nope, I can't seem to get working krb5 keytabs at the system level to mount the volumes.

Hnng.

Nextcloud decided to stop mounting its NFS datastore yesterday, and kept throwing "mount(2): operation not permitted".

Now this morning I get up to troubleshoot it, and it's decided to just start working.

wtf #AdventuresInSelfHosting

Something went amiss on one of my kubernetes workload nodes, which brought down this instance.

Due to the setup, however, it failed over -automatically!- in a few minutes. I think the downtime was mostly in promoting the secondary PG instances when the primary vanished.

But I didn't have to intervene other than to restart the k3s services on the failed node. I'm not sure what happened, but "Killed" makes me think it hit an unrecoverable OOM condition.

#AdventuresInSelfHosting

Today's #adventuresInSelfHosting, I was having trouble with #cloudnativePG where I'd always have n-1 pods stable and they'd constantly restart between the replicas in a massive loop, then eventually switch with the primary and continue the process. So, I set the instance count to 3 so there'd at least be one stable read replica at a time.

I finally found out what was wrong. I have my #cnpg cluster in the postgresql-system namespace and I happened to see that I had an operator running in the default cnpg-system namespace. I don't know how long it had been there, so both it and the one in my namespace were both competing for the state. Deleting and cleaning up that old cluster brought immediate stability.

I also realized that I wasn't overriding the default #php configuration for #pixelfed, so when I uploaded an image taking on my phone, the web server would restart. I bumped the php memory up to 1GB for now. For the expected userbase for the upcoming #keyboardvagabond #fediverse space, this should be fine.

Right now the services are running well, but I need to do more testing and get mastodon into an "interesting" state for new visitors. Pixelfed seems the hardest for me in terms of getting content onto the server so that it doesn't look barren.

The todo list for now is:

comprehensive testing
get hcaptcha working on all services, or find an alternative
add the community block list to pixelfed
make pixelfed look interesting (any tips would be greatly appreciated!)
get bookwyrm running
create an intro landing website for www subdomain
get the #soonTM mascot in there! I'm super excited for what comes out of that
set up mastodon SSO/OAuth

It's getting close! The services are essentially ready, just not necessarily turned on for signups until I'm ready for a pre-launch or full launch. I want to make sure things are in a good state.

But with the 2 node #kubernetes #cluster, I think things should be good!
By then end, it should look like:

a good landing page for getting involved both in keyboardvagabond and the fediverse
#pixelfed
#piefed
#mastodon
#writefreely
#bookwyrm
technically authentik, but that didn't wind up being as useful as I thought

Bad Things happen to your K3s cluster when one of the workloads suddenly runs amok and devours resources in all the workload nodes. Until now, you had Enough; now, you have Pain.

It took 45 minutes to undo the mess, but I managed to save everything. Thankfully, due to proper tainting, the control plane was safe.

I’ll have to figure out why it went mad. Not sure if it was CloudNative-PG or Longhorn. That’s for another time.

#AdventuresInSelfHosting

Bumped up to 4.4.2+glitch, and remembered to also bump up Elasticsearch to 7.17.29. (I'm surprised they still release 7.17 when they're on 9 otherwise.)

#AdventuresInSelfHosting

#TodayILearned that the local-path provisioner in Kubernetes, at least in K3s, doesn't actually pay any attention to the storage size limit set in the volume claim.

So I thought I had a volume that had run out of space, when it actually didn't give a shit what I put. As long as the system disk had enough space, that was good enough. Uh, that's fine I guess to start?

#AdventuresInSelfHosting

I managed to move the database for my Joplin note-taking app from a PGSQL instance on a Linux container and into CloudNative-PG.

There's something funky with creating persistent volumes when quotas are involved, so for now I've slung it over to its own namespace without quota. (Y'all can't reach it anyway, so it doesn't matter. :))

As long as I can get a nightly backup of the database and do a drop/restore, I'll migrate it over to the quota'd namespace eventually.

#AdventuresInSelfHosting

Upgrade to mastodon 4.4.0+glitch GET!

Lots of new stuff in this release that I need to delve in and see if it's of any use.

https://blog.joinmastodon.org/2025/07/mastodon-4.4/

https://github.com/glitch-soc/mastodon/releases/tag/v4.4.0

#AdventuresInSelfHosting

I'm looking forward to those Enterprise grade SSDs arriving tomorrow. The Ceph object storage cluster's performance suffers heavily on spinning rust (traditional hard drives).

#AdventuresInSelfHosting

PG DB upgrade from v16.8 to v17.5 was successful. All apps are still functioning. And this time I've pinned the pgsql version so I don't get an automated upgrade surprise that breaks my apps. #AdventuresInSelfHosting

I guess it is not a good idea to install #postgresql without a version on #alpine, as it's more than happy to just install the v17 and remove the v16 binaries - without doing any sort of migration.

Had to roll back the container to the snapshot taken this morning. To avoid the issue again, I've moved the Alpine packages explicitly to v16. I'll tackle v17 another day.

I suppose just running software upgrades via Ansible overnight is maybe not the best idea, either. :O

#AdventuresInSelfHosting

Upgraded to v4.3.8+glitch. #AdventuresInSelfHosting #Mastodon #glitchsoc

Now serving 10Gb/s fiber at the home lab. Migrating a VM from one node to the other, having to transfer almost 4GB of state from one node to the other, all while keeping the node itself online. 5 seconds. FIVE. 🤩

#AdventuresInSelfHosting

2025-04-28 22:59:55 start migrate command to unix:/run/qemu-server/2041.migrate

...

2025-04-28 23:00:01 average migration speed: 686.1 MiB/s - downtime 92 ms
2025-04-28 23:00:01 migration completed, transferred 3.5 GiB VM-state

In the NAS ... I have no free PCIe slots! Either I need to swap out the CPU for one with iGPU and ECC (Ryzen PRO -G) or move the Arc GPU and thus Jellyfin to one of my PMox systems.

The 3850 switch works a champ and for not much more power consumption - the 12 port switch runs about 100W. :D

#AdventuresInSelfHosting

I can't even slot the VIC into my old AMD FX-4350 box because it has precisely one PCIe x8 slot, and that's taken up by the graphics card. It fits in the Intel 4th gen box, and it's recognized, but I can't seem to get it to detect the link, even though the switch end sees it just fine. I don't want to bring down the main Pmox box to test in that just yet.

#AdventuresInSelfHosting

Now that I've built a Proxmox cluster at home, I figure I should be moving my networking among them and the NAS from 1G to 10G.

My coworker at the office who runs the connectivity lab gave me an old 3850-12XS switch (we only use 9k's now) and a few Cisco VICs from an old WiFi controller to see if it would work in my setup.

#AdventuresInSelfHosting

Working on how to map storage on my NAS into my K8s test cluster so I can look at deploying helm charts that need to maintain state. I haven't quite got it down, but I think I'm getting closer.

https://kubedemy.io/kubernetes-storage-part-1-nfs-complete-tutorial

#AdventuresInSelfHosting

So far I managed to get one of my apps working in K3s, but using a dedicated port rather than the inbuilt Træfik web proxy.

Most of the documentation I've stumbled upon assumes I'm installing it as in real K8s, but it comes pre-installed in K3s.

Has anyone here using K3s managed to get web apps in other namespaces to use the kube-system/traefik proxy?

#AdventuresInSelfHosting

#adventuresInSelfHosting

Client Info