Oh look, it's an "Ubuntu 24.04 pushed out a systemd-networkd update and broke things" week again.
Did you know that #systemd-networkd *really* dislikes it when anything else tries to manage the system's route table, and when it restarts it will frequently remove a bunch of routes that were added by dynamic routing, #K8S CNI, or other similar tools?
This has caused a number of fairly public outages for sites over the past year or two. There are a set of config flags for /etc/systemd/networkd.conf that will disable this behavior, but the last of them went in in Systemd v256, and #Ubuntu 24.04 (LTS) uses v255.
I have a writeup with some workarounds: https://scottstuff.net/posts/2025/02/25/frr-vs-systemd-networkd/
In my case, I tend to lose a bunch of either IPv4 or IPv6 (but rarely both) routes when `networkd` updates roll out on their own. My first indication of trouble is usually a ping-check alert firing.
My specific problem is caused by #FRR; it defaults to creating its own kernel next-hop groups for routes learned from OSPF or BGP, and `systemd-networkd` then comes along and cleans them up, which causes the kernel to drop all of the routes that were associated with them. FRR fails to re-add the routes until it's restarted, so things stay broken until they're manually cleaned up.
I've been testing a fix on ~half of my machines since July to see if it caused any *other* problems, and it seems to have passed. Adding `no zebra nexthop kernel enable` to FRR seems to avoid the nexthop problem without causing any other issues for me. This *may* break if you use policy-based routing with FRR, but it's probably safe otherwise.