@mwl @kta At some point the Linux community needs to accept cgroups is a utter dumpster fire.
Also that article seems to have utterly misunderstood what docker/cgroups/kubernetes got fucking wrong.
> 1. First off, we ditch the shared-kernel approach entirely. We need to build a micro-hypervisor model, where each container runs its own minimal kernel. This ensures that every container is genuinely isolated, similar to a lightweight VM but without the bloat. By employing a microkernel architecture, you’re essentially granting each container its own mini-OS that only loads essential components, drastically reducing the attack surface. This step eliminates the primary flaw of Docker’s shared-kernel model.
You mean proper OS level virtualization. This means accepting cgroups sucks and frankly ditching the not-invented-here attitude that permeates Linux dev stuff.
> 2. Next, leverage hardware-assisted virtualisation like Intel VT-x or AMD-V to handle isolation efficiently. This is where we’ll differentiate ourselves from Docker’s reliance on namespaces. With hardware support, each container will get near-native performance while maintaining strict separation. For example, instead of binding everything to a Linux kernel, containers will interact directly with hardware-level isolation, meaning exploits won’t have the chance to jump from one container to another.
You are now wanting stripped down VM. Nothing new needed to do this already. Libvirt+QEMU, Bhyve, VMware, etc will do this quiet happily.
> 3. We can’t ignore orchestration. Rather than bolting on security later, build an orchestration layer that enforces strict security policies from the get-go. This orchestration tool, think Kubernetes but with security baked in, will enforce seccomp, AppArmor, and SELinux profiles automatically based on container configurations. For instance, before launching a container, the orchestration layer could analyse its dependencies and generate a security profile dynamically, ensuring that each container only has access to the resources it needs.
For fuck sake... please don't. This is the issue with docker/kubernetes... it is a utter dumpster fire thanks to being a over complex nightmare that is a PITA when it comes to orchestration.
Also this sounds like they have never dealt much with seccomp or AppArmor. That shit is broken as fuck by default.
Better idea. Make it easy to control with like how it is with Jails on FreeBSD and start with actually sane defaults.
> 4. Let’s go beyond the crude root vs non-root distinction Docker offers. Implement a permission system that assigns containers fine-grained capabilities, like capabilities management in modern OSes. You’ll create an RBAC model that defines precisely what a container can or cannot access such as network resources, storage, specific hardware, etc. Imagine having a declarative YAML file that specifies, down to the syscalls, which capabilities each container is granted, ensuring it only gets what it genuinely needs to function.
Again you are wanting OS level virtualization.
You are again trying to re-invent the wheel instead of looking at working examples of something like this.
> 5. Containers shouldn’t be changing their state once they’re up and running. We must enforce immutable infrastructure, meaning containers are rebuilt from scratch for every update rather than being patched live. This prevents attackers from persisting inside a compromised container. Think of this as Docker’s “build once, deploy everywhere” mantra. It never truly worked for Java (also a technology that Linus absolutely hates), but it might just work for a containeriser. Changes require redeployment, not modification, thus ensuring that every running instance is identical to the tested version.
Drek like this should be a end user choice. There are lots of reasons for both. Again the base tooling for this should give zero fucks.
Want to make something like this awesome? Start with a sane base design such as Jails, which provides a nice system to do what ever you want with and allows you to keep a existing one with patching live or rebuild... both just as easily.
> 6. Build in real-time vulnerability scanning and automated patching. Containers should be scanned continuously, not just at build time. If a vulnerability is found, the system will either patch it in the background or alert you to rebuild the affected containers. This means integrating tools like Clair or Trivy directly into the platform, ensuring that no container runs outdated or vulnerable code.
This is something totally unrelated to what is being talked about here. The scope of complexity here means it by no means should be part of the same tooling.
#FreeBSD #jails #linux #cgroups #Docker #Kubernetes