Strange things happend in my homelab - as always in IT 😏 .
Intel NUC with an *additional* 2,5" HDD (yes, 5400 RPM Harddisk, not an SSD). Had like 20 Docker Container running on it.
A couple of weeks ago I started see some wired things happening at one of the Docker Images (just one! Grafana, Graylog... everything works fine). A Nextcloud Instance failed. Some Files just broke?! Like I mean .php Files that suddenly had garbage in them. At least, the Nextcloud Image brings original Files (/usr/src/nextcloud) with them, so I could copy the working ones over the broken ones. Problem fixed. For now at least.
A week later. Same thing. Fixing the error (some other files with garbage in them). Done.
Couple of days later: Some problem again. WTF?
I tried to investigate and be sure it's not some sort of an cyber attack. But spending hours of forensics, there wasn't any artifacts of an attack.
Then I thought it might be a bug in Nextcloud. Because I found a coincidence: Every time watchtower was updating Nextcloud, it failed with creating garbage files (randomly). But no one else had any problem, so I investigated further.
After a couple of hours I made a quick S.M.A.R.T. check (although I don't entirely trust it). Nothing. Then an fsck. Some errors were found and fixed. Puh!
So I thought, that's it: After a Power Failure a couple of weeks ago, I had issues with the Filesystem. fsck fixed it. And could move on.
But, of course... this wasn't the end. With the upcoming Nextcloud updated, it failed again.
I ran the old checks again: but fsck couldn't find any errors. But I had still broken php files - just inside the Nextcloud Container.
Because the problem smelled fishy, I forced a long S.M.A.R.T. check and..... it got interrupted because of a couple of errors.
The End of the story: I'm switching to a new Drive which (probably?) fixes the problem.
The point of the Story:
1. Some things are easier than they seem. It was just a plain old hardware failure. No sophisticated cyberattack, nor an critical nextcloud bug.
2. Look at an error from different point of views. Even fsck and S.M.A.R.T. (smartctl) could point you to the wrong direction.
For the ones who like to see how to temporary fix the nextcloud issue, nextcloud community helped me: https://help.nextcloud.com/t/strange-error-class-oca-themingcontroller-does-not-exist/196455/12
#harddrive #harddrives #hdd #hdds #ssd #ssds #storage #nextcloud #failure #SmartMonitor