#RukiiNet #SelfHosting update:
Just after writing this #Curie went down again, and it didn't help that the #NFS pods were all on a different node. It all went down regardless.
Even got some data corruption again, it's always a huge manual hassle to bring everything back up. I read somewhere that #MicroK8S tends to be bad with hard reboots if some specific singleton cluster pods like coredns or calico or nfc controller or hostpath provisioner are on the node which goes down. I wonder if it's possible to just add replicas for those...
I found a new (old and known) bug with #OpenEBS, and a mitigation. In some cases, #Jiva has replicas in a readonly state for a moment as it syncs the replicas, and if the moon phase is correct, there's an apparent race condition where the iSCSI mounts become read-only, even though the underlying volume has already become read-write.
To fix this is to go to the node which mounted these, do "mount | grep ro,", and ABSOLUTELY UNDER NO CIRCUMSTANCE UNMOUNT (learned the hard way). Instead, I think it's possible to just remount these rw.
There's also an irritating thing where different pods run their apps with different UIDs, and the Dynamic NFS Provisioner StorageClass needs to be configured to mount the stuff with the same UID. I originally ran this by just setting chmod 0777, but the apps insist on creating files with a different permission set, so when their files get remounted, their permissions stay but the UID changes, and after a remount they don't have write access to the files anymore.
This compounds with the fact that each container runs on its own UID, so each needs its own special StorageClass for that UID... Gods.
I got the new #IntelNUC for the fourth node in the cluster to replace the unstable Curie node, but memories for it are coming Thursday.