#jemalloc

Felix Palmen :freebsd: :c64:zirias@bsd.cafe
2025-05-30

Finally getting somewhere working on the next evolution step for #swad. I have a first version that (normally 🙈) doesn't crash quickly (so, no release yet, but it's available on the master branch).

The good news: It's indeed an improvement to have *multiple* parallel #reactor (event-loop) threads. It now handles 3000 requests per second on the same hardware, with overall good response times and without any errors. I uploaded the results of the stress test here:

zirias.github.io/swad/stress/

The bad news ... well, there are multiple.

1. It got even more memory hungry. The new stress test still simulates 1000 distinct clients (trying to do more fails on my machine as #jmeter can't create new threads any more...), but with delays reduced to 1/3 and doing 100 iterations each. This now leaves it with a resident set of almost 270 MiB ... tuning #jemalloc on #FreeBSD to return memory more promptly reduces this to 187 MiB (which is still a lot) and reduces performance a bit (some requests run into 429, overall response times are worse). I have no idea yet where to start trying to improve *this*.

2. It requires tuning to manage that load without errors, mainly using more threads for the thread pool, although *these* threads stay almost idle ... which probably means I have to find ways to make putting work on and off these threads more efficient. At least I have some ideas.

3. I've seen a crash which only happened once so far, no idea as of now how to reproduce. *sigh*. Massively parallel code in C really is a PITA.

Seems the more I improve here, the more I find that *should* also be improved. 🤪

#C #coding #performance

Felix Palmen :freebsd: :c64:zirias@bsd.cafe
2025-05-23

Working on the next release of #swad, I just deployed an experimental build with the server-side #session completely removed.

Then I ran the same #jmeter stress test on it as before. It simulates 1000 distinct clients, all requesting the login form and then POSTing "guest:guest" login to trigger proof-of-work 50 times in a loop, timed in a way so an average of 1000 requests per second is sent.

After running this once, I thought I didn't gain much. The old version had a resident set of 95MiB, the new one 86MiB. But then running it two more times, the resident set just climbed to 96MiB and then 98Mib, while the old version ended up at something around 250MiB. 😳

So, definitely an improvement. Not sure why it still climbs to almost 100MiB at all, maybe this is #jemalloc behavior on #FreeBSD? 🤔

One side effect of removing the session is that the current jmeter test scenario doesn't hit any rate-limiting any more. So, next step will be to modify the scenario to POST invalid login credentials to trigger that again and see how it affects RAM usage.

Auth cookies of the "session-less" swad, containing signed JSON web tokens for two realms, here "builder" and "testbuilder"Decoding one of these tokens in jwt.io
Andrei Kaleshkawidefix@ruby.social
2025-03-08

Have you ever dealt with this issue? Should I install #jemalloc right away or play detective? Setup: #ruby 2.7.8, #puma 3.12.6.

Ruby memory leak
Jools [Friendica]jools@missocial.de
2025-03-06

So, jetzt endlich scheint meine Instanz auf dem neuen Server rund zu laufen. Nach dem Wechsel auf den neuen Server gab es immer noch Problemchen und so richtig wurden die Anzahl der Instanzen, die meine Instanz kennen, nicht mehr.

Vor dem Umzug war ich bei ca. 28.000 Instanzen, nach dem Umzug "nur" noch ca. 18.000.

Gestern gab es dann mit einem Mal einen richtigen Schwung und man konnte zuschauen, wie die Anzahl der Instanzen immer weiter stieg. Heute sind wir dann bei 33.000, Tendenz steigend - und das bei einer Single-User-Instanz. 😅😁


Umstellung auf jemalloc war ebenfalls erfolgreich mittels dieser Anleitung.

Link: Using MariaDB with TCMalloc or jemalloc


Was MariaDB aktuell verwendet, kann man übrigens auch mittels

SHOW GLOBAL VARIABLES LIKE 'version_malloc_library';

herausfinden.

#Friendica, #Server, #jemalloc

Das Diagramm zeigt Kreisabschnitte in verschiedenen Farben, die Statistiken über die föderalen Knoten und Benutzer von Friendica-Installationen darstellen. Derzeit sind es 33,074 andere Knoten (mit 6,637,075 aktiven Nutzern im letzten Monat, 3,147,245 aktiven Nutzern im letzten halben Jahr, 32,144,979 registrierten Nutzern insgesamt).
2025-02-21

As my #Friendica server, my-place.social, has grown to some 315 active users in just 5 months, I'm starting to hit up against #mariaDB limitations relating to the default memory manager, #MALLOC. This weekend I'm going to replace it with #jemalloc to reduce stalls, memory fragmentation issues, out-of-memory problems, and instability.

Friendica puts a lot of pressure on the database, mariaDB in this case, much more than Mastodon apparently does on PostgreSQL. My feeling is that the Mastodon developers have done much better database tuning.

But, none-the-less, the update must be done. This will be done on an Ubuntu server.

Does anyone who has changed the MariaDB or mySQL memory manager have any advice to share to keep me out of trouble?

BTW, #TCMalloc is not an option as other admins have reported crashes using it with Friendica.

2025-02-06

Released ruby-install 0.10.1 with a minor fix for homebrew users who also want to compile ruby with jemalloc support.

github.com/postmodern/ruby-ins
github.com/postmodern/ruby-ins

#ruby #ruby_install #rubyinstall #jemalloc #homebrew

2025-02-06

Add one more vote for using #jemalloc for #rails applications. This is our memory graph for the #Sidekiq workers.

Besides adding the linked buildpack and setting configuration options, we had nothing else to do.

elements.heroku.com/buildpacks

A 2h monitoring graph of memory usage for the Sidekiq worker dynos. The graph shows a sharp drop at 9:30 am. The top line of the graph (max usage) was constantly over the 1 GiB quota, but at 9:30 am, the line goes down to about 512 MiB.
2025-02-05

Released ruby-install 0.10.0! This release contains many small improvements to usability and better support for building CRuby with jemalloc or YJIT enabled.

$ ruby-install ruby -- --with-jemalloc
$ ruby-install ruby -- --enable-yjit

github.com/postmodern/ruby-ins
github.com/postmodern/ruby-ins

#ruby #rubyinstall #ruby_install #jemalloc #yjit

К вопросу использования #epoll вместо хорошо знакомых и «традиционных» select & poll. Т.е. асинхронной работы с чем-либо посредством polling’а и мультиплексирования.

Недавно пришлось заниматься реализацией очереди событий для AMQP-CPP. В одном из продуктов решено сделать связь агентских частей с основным «контроллером» через #AMQP, в качестве брокера #RabbitMQ (всё стандартно, обычный кластер и TLS-соединения).

Вот только агенты продукта активно используют асинхронно-реактивное программирование с хорошей «горизонтальной масштабируемостью». Когда достигнуто полноценное sharing nothing, не просто горизонтальная масштабируемость через lock-free или wait-free и закон Амдала. Исключается много всего и сразу, как старый-добрый cache ping-pong, так и печаль с false sharing.

Отсюда внутри агентов и своё управление потоками с выделениями памяти. Не только в плане heap (динамической памяти, со своими аллокаторами а-ля #jemalloc от #Facebook), но и приколы вокруг узлов #NUMA и даже huge pages (снижающих «давление» на #TLB, меньше промахов).

Первая же проблема выплыла почти сразу — не реально использовать библиотеку AMQP-CPP с уже предоставляющейся поддержкой #libev, #libuv, #libevent. Несовместимы эти очереди сообщений с имеющейся моделью управления потоками и организации задач на агентах.

Почему был взят epoll

Подход используемый в #epoll выглядит более современно, меньше копирований памяти между user space и kernel space. А при появлении данных в отслеживаемом файловом дескрипторе можно напрямую перейти по указателю на объект класса или структуру данных. Тем самым обходиться без поиска дескриптора по индексным массивам/контейнерам. Сразу же работать с экземплярами объектов оборачивающих нужное #tcp -соединение, того самого, в которое и пришли данные.

И тут обозначилась вторая проблема, что используема AMQP-библиотека не вычитывает данные целиком из потока сокета. Например, забирает данные лишь до тех пор, пока не насытится автомат состояний (finite-state machine), выполняющий парсинг сущностей AMQP-протокола.

Используя #epoll приходится выбирать на какой вариант обработки событий ориентироваться:

  • срабатывание оповещений «по уровню» (level-triggered),
  • выбрасывания событий «по фронту» (edge-triggered).

И беда с библиотекой в очередной раз показала, что нельзя использовать работу «по фронту» (edge-triggered) не изучив досконально работу подсистемы отвечающей за вычитывание данных из файловых дескрипторов. И появление флага EPOLLET в коде является маркером, о том, чтобы проводить аудит использовавшихся решений.

Про Edge Triggered Vs Level Triggered interrupts можно почитать в https://venkateshabbarapu.blogspot.com/2013/03/edge-triggered-vs-level-triggered.html)

#programming #linux #трудовыебудни

2024-08-27

rediscovering that jemalloc is now supported on alpine

github.com/jemalloc/jemalloc/i

When I was building docker images for mastodon on alpine was so hard to integrate that feature

#jemalloc #alpinelinux #ruby

Christos Argyropoulos MD, PhDChristosArgyrop@mstdn.science
2024-08-26

Fun project: a #memory manager for multi-lang applications (#c, #cplusplus #assembly #fortran) in which workflow management is done by #Perl. Currently allocates using either #perl strings or #glibc malloc/calloc. Other allocators #jemalloc coming soon.
github.com/chrisarg/task-memma

Christos Argyropoulos MD, PhDChristosArgyrop@mstdn.science
2024-08-26

Fun project: a #memory manager for multi-lang applications (#c, #cplusplus #assembly #fortran) in which workflow management is done by #Perl. Currently allocates using either #perl strings or #glibc malloc/calloc. Other allocators #jemalloc coming soon.
github.com/chrisarg/task-memma

Christos ArgyropoulosChristosArgyrop@mast.hpc.social
2024-08-26

Fun project: a #memory manager for multi-lang applications (#c, #cplusplus #assembly #fortran) in which workflow management is done by #Perl. Currently allocates using either #perl strings or #glibc malloc/calloc. Other allocators #jemalloc coming soon.
github.com/chrisarg/task-memma

2024-04-29

I just learnt about `jemalloc` in order to fix the memory hunger of Synapse.

So yeah, Python developers will rather hijack the glibc memory allocator than switch to a resource efficient language.

#jemalloc #Matrix #Synapse #Python #glibc #programming

12 Freya it/its𒀭𒈹𒍠𒊩12@eightpoint.app
2024-02-29

progress: #jemalloc built ok. Attempting to build #Ruby 3.2.3. We are getting far, far further with this than we did with Solaris, to the point that this may actually work

12 Freya it/its𒀭𒈹𒍠𒊩12@eightpoint.app
2024-02-29

progress: have to build #jemalloc from source to allow Ruby to build. Doesn't seem to be a massive problem, which is good

Client Info

Server: https://mastodon.social
Version: 2025.04
Repository: https://github.com/cyevgeniy/lmst