#Ceph

gürkan �’ :catlong_head_left::catlong_h::catlong_end_right:gurkan@has.siktir.in
2025-10-11

I have almost fully puppetized #ceph in $dayjob. Sadly cannot open source it, but after seeing the sad state of the open source status of somewhat-working 2 ceph puppet modules, I might reserve some time later.

New blog post: blog.mei-home.net/posts/ceph-r

I describe how I set up Prometheus metrics gathering for Ceph's S3 RadosGW.

And I finally realise that "chart" is the right for a...well...chart.

#HomeLab #blog #Ceph #metrics

YoSiJo :anxde: :debian: :tor:YoSiJo@social.anoxinon.de
2025-10-08

#ceph osd db devices in form von #lvm thinpool volumes; Genial oder Wahnsinnig?

:rss: Qiita - 人気の記事qiita@rss-mstdn.studiofreesia.com
2025-10-07

ローカル上のS3互換APIを機能を提供するオブジェクトストレージのデータをS3に一時同期してみた
qiita.com/tarekatsu_eng/items/

#qiita #AWS #S3 #初心者 #Ceph #rclone

2025-10-06

Need some #Ceph advisory for my private cluster on #proxmox:

3 nodes with 2x 18 TB HDD each.

According to some docs I found, WAL/DB size should be 1-4% of the disk size, so a SSD should be between 180 to 720 GB for each disk or 360 to 1.44 TB for the two disks in each node.

Is this still be true or is it nowadays more relaxed and - let's say - 200 GB for each HDD will be sufficient?

#followerpower #ceph #proxmox

2025-10-06

Man sieht an den Latenzen uebrigens schon sehr schoen, welche OSDs ich schon von WAL/DB auf SSD umgestellt habe auch interne WAL/DB...

#Ceph

Screenshot der Ceph OSD Latenzen
2025-10-04

So langsam kommt der #Ceph Cluster nach dem Aussetzer dieser Tage ja wieder zur Ruhe... bzw. hat das Backfilling abgeschlossen.

Backfilling, weil ich 2 OSDs von ihrem WAL/DB auf SSD befreit habe, nachdem das immer langsamer geworden war.

Nun ist das Mail-Storage wieder schnell, obwohl es ja ohne SSD eigentlich langsamer sein sollte.

Wenn das Backfilling beendet ist, kommen die naechsten OSDs dran.

Und wenn alle durch sind, ueberlege, ob ich z.B. 3x Micro 5400 Max mit 480 GB fuer den Zweck kaufen soll? Die haben immerhin ein DWPD von >1. Bei den WD Red SA500 habe ich so eine Angabe nicht finden koennen und nach ca. 2-3 Jahren kann ich nicht behaupten, dass ich die fuer so einen Zweck nochmal kaufen wuerde...

Screenshot vom Ceph Status in Proxmox

Main project for the long weekend: Finally gathering some metrics on the S3 buckets in my Ceph cluster. Up to now, I've only got the total size of the pool holding all of the buckets, but no metrics on how big the individual buckets are.

I will try to use this Prometheus data exporter: github.com/blemmenes/radosgw_u

#HomeLab #Ceph

Scott Williams 🐧vwbusguy@mastodon.online
2025-10-01

I have a #ceph induced headache.

2025-09-30

Today I had a really pleasant experience of upgrading #Ceph clustered storage from version reef to squid on our 3-node Proxmox Cluster.
At this point I almost dare to say it's basically zfs for networked, but I haven't configured it by hand from scratch yet.

New blog post: blog.mei-home.net/posts/broken

This time I'm talking about how I replaced a broken HDD in my Ceph cluster.

#HomeLab #Ceph #Blog

2025-09-27

Also #ceph ist ja schon spannend...
ich versuche einen Container zu starten.
Fehler: libceph kann monmap nicht lesen
Hinweis: 'ceph min dump' sagt das ein mon nur eine IP hat.
Lösung: den betreffenden mon töten und neu erstellen...
Auf einem ganz anderen node...
Gemacht, getan, geht...

Florian Haasxahteiwi
2025-09-24

This feature landed in the Pacific release, more than 5 years ago.

I wonder how common it is to use this in production.

docs.ceph.com/en/squid/rados/o

Florian Haasxahteiwi
2025-09-24

Short thread:

Can a person please explain something to me?

CloudNativePG seems to assert that if a Kubernetes cluster uses or Longhorn for persistent storage, replication in the underlying storage layer is superfluous.

cloudnative-pg.io/documentatio

Short recap of the weekend: Since this morning around 00:15, my cluster is back in a healthy state. The dangertime with reduced redundancy lasted from Saturday 21:00, when I took out the old HDD until Sunday morning around 07:00, when the last undersized PG was remapped.

With the switch from a 4TB to an 8TB HDD, I gained about 1.33TB usable space in the cluster.

The new HDD does feel a bit louder when being accessed.

Blog post to come.

#HomeLab #Ceph

A screenshot of a Grafana time series plot over the last couple of days showing the state of the Placement Groups in my Ceph cluster. It shows that around 10:00 on Saturday, 63 remapped PGs appeared. Afterwards, they slowly decreased until a spike up to 110 remapped and 150 undersized PGs appeared around 21:00 on that same Saturday. Both then went down again about 30 minutes later. They then continuously reduced for the following day. The last undersized PG got fixed around 07:00 on Sunday morning, while the last remapped PG disappeared at 00:15 on Monday morning.
2025-09-22

My #Mastodon storage bill:
73.1 GiB of user-content stored in my self-hosted #Ceph #RadosGW erasure-coded #S3 storage pool.

#SelfHost #HomeLab

2025-09-21

"Cephalopod ID Guide for the Mediterranean Sea", C. Drerup and G. Cooke, 2019

zenodo.org/records/2589226

#mediterranean #ceph #cephalopoda #octopus #squid #sepia

The Ceph cluster backfill onto the new HDD is still ongoing, but I was already out of the woods this morning around 07:00 , when the last degraded+undersized PGs got backfilled. Since then, the cluster has merely been filling up the new HDD, and now it seems to have decided to move a few additional PGs around to the new disk.

The peak around 21:00 was when I temporarily shut down one of the Ceph hosts to replace the HDD after realising I didn't have enough space left.

#HomeLab #Ceph

A screenshot of a Grafana time series plot. It shows a time range from 10:00 to 08:00 the next morning on the X axis. The Y axis shows the number of Ceph Placement Groups in different states of distress. At the beginning of the plot, it shows 63 PGs in Degraded and Undersized states. This was the moment I took the old, damaged HDD out of the cluster and the rebalancing to the remaining two HDDs started.

The unhealthy PGs go down in fits and bounds, with phases of 1 hour and more where the counts don't change, and then 6 PGs get fixed in short order. The monotonic decrease ends suddenly around 20:50, when the counts shoot up to 122 degraded and 147 undersized PGs. This high count is then rather quickly reduced, and fall back to previous levels at around 21:30. This short interval was the time where I shut down one of the hosts to replace the broken HDD with a new one. After that, the counts decrease monotonically again, until both counts reach zero at about 07:00 the next morning.
René von Wolfen-Nord :db: :fckafd: :durka:rene@wolfen-nord.social
2025-09-21

Ваше мнение, господа?

#proxmox #zfs #ceph

Dang, I might have miscalculated. I've still got 16% of PGs to remap, and one of my disks is already almost full. Seems I don't actually have enough storage to survive with an entire disk gone.

#HomeLab #Ceph

A screenshot of the OSD overview in Ceph's dashboard. It shows that there's one OSD out and down, while another OSD is already 88% full, with another one 78% full.

Client Info

Server: https://mastodon.social
Version: 2025.07
Repository: https://github.com/cyevgeniy/lmst