View Issue Details

IDProjectCategoryView StatusLast Update
0006429unrealircdpublic2024-08-02 16:54
Reporteranhtribao Assigned Tosyzop  
PrioritynormalSeveritycrashReproducibilityalways
Status acknowledgedResolutionopen 
PlatformDockerOSAlpineOS Version3.19.1
Product Version6.1.5 
Summary0006429: unrealircd w/musl = memory leak on REHASH (is by design @ musl)
DescriptionAfter rehashing many times the ircd (rehash of configuration + rehash of TLS), the ircd takes more and more memory and eventually crashes.

Maybe only one kind of rehashes is enough (I haven't tested each individually yet)
Steps To ReproduceStart an ircd with remote includes + TLS certificate.
Perform a ./unrealircd rehash then a ./unrealircd reloadtls and repeat (no need to spam)

After each couple of rehashes or so the ircd takes more memory.
Eventually the ircd cannot load modules and crashes.
Additional InformationLIBC = musl-1.2.4_git20230717-r4

See Dockerfile

Sensitive information in the config has been masked ;)
TagsNo tags attached.
3rd party modulesNone

Activities

syzop

2024-07-05 09:47

administrator   ~0023234

Thanks for the report! You shared many files which is really helpful. Could you share your config.settings as well?
I don't know if it is related to remote includes, but the config.settings will tell me which remote includes engine you use (curl or built-in)

syzop

2024-07-05 09:50

administrator   ~0023235

Oh I somehow missed you mentioned reloadtls as well.
1) Is the "./unrealircd reloadtls" needed to trigger the issue or does it also happen with only "./unrealircd rehash" ?
2) Does it already happen with "./unrealircd reloadtls" and without "./unrealircd rehash"? Because that narrows it down a lot.

syzop

2024-07-05 10:07

administrator   ~0023236

Here if i "./unrealircd reloadtls" on its own, many times, i don't see VSZ increase but it could be a little bit.
If I "./unrealircd rehash" on its own, and do that ten times, yes it leaks but it is 140K per 10 rehashes, so about 14K per rehash.
If I do the combination rehash+reloadtls and repeat that 10 times, then it is 220K in total, so 22K per rehash+reloadtls combination.

This all sounds too marginal compared to the problem you are experiencing right? I mean, you were talking about a rehash daily and crashing after 2 months, and 22K * 60 = 1.3 megabyte.

Important sidenote, especially if it is TLS-related: i replaced your specific loading of tls cert/keys (you had multiple in your conf) to a single one (and different keytype).

QUESTION: what kind of figures are we talking about on your end, how big is the leak after a rehash (or a rehash+reloadtls) ? If you look at "ps auxw|grep unrealircd" before and after.

anhtribao

2024-07-05 13:11

reporter   ~0023239

> This all sounds too marginal compared to the problem you are experiencing right? I mean, you were talking about a rehash daily and crashing after 2 months, and 22K * 60 = 1.3 megabyte.

Exact, the total increase is very small. I don't know why the OOM happens because the RAM is very far from being full (5+GB of RAM available) so I guess this could be caused by the Docker environment (but I don't remember limiting the performances), also I have other containers beside with much more heavy things like Elastic database, Nextcloud, GitLab, .. but I never encountered OOM on them.

It is no big deal for me. The rehash was not meant to happen daily in the beginning. The goal was to rehash after detecting the reception of the new LE TLS certificate (which is renewed and uploaded from another server) and somehow my upload script sends the certificate every day (whether new or not) instead of only when it has changed, which I intend to correct.

syzop

2024-07-05 14:31

administrator   ~0023240

Last edited: 2024-07-05 14:32

Thanks for getting back to me :)

With REHASH only (fresh, then after every 10x) using % for i in {1..10}; do echo $i; docker exec docker-irc-unreal_sandcat /var/lib/unrealircd/unrealircd rehash; sleep 10; done
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
ircuser 2037158 0.0 0.2 30148 21788 ? S 09:13 0:00 /var/lib/unrealircd/bin/unrealircd
ircuser 2037158 4.9 0.9 86264 78100 ? S 09:13 0:25 /var/lib/unrealircd/bin/unrealircd
ircuser 2037158 7.3 1.6 140316 130796 ? S 09:13 0:53 /var/lib/unrealircd/bin/unrealircd
ircuser 2037158 8.9 2.2 194356 183456 ? S 09:13 1:29 /var/lib/unrealircd/bin/unrealircd
ircuser 2037158 11.3 2.9 248336 236180 ? S 09:13 2:16 /var/lib/unrealircd/bin/unrealircd
ircuser 2037158 13.8 3.5 302372 288876 ? S 09:13 3:14 /var/lib/unrealircd/bin/unrealircd

ircd crashed after the 68th rehash (I could not ps at that time so I guess VSZ was 345572 at that time)
It's about 54k increase every 10 rehashes, also the CPU increases.


If I look at the VSZ column in your paste. It started with 30148 and the last in the output was 302372 after.. 50 or 60 rehashes? So that's 270M difference.
That's like a 5444k (5.4M) difference per rehash. A leak of 4M or 5M per REHASH is big. So.. would indicate some problem that I would have to solve.

The most important changes between you & me are:
1) the use of musl (instead of glibc)
2) that i used different SSL certs/keys
3) possibly different OpenSSL and other library versions
4) i am sure there are more..

As for point 2: I suppose I can try to generate a similar set of SSL cert and keys. I saw the names, you really had a ed25519 DSA key? Or, alternatively, test the other way around: you could remove the custom certificate/key in the set block and listen and see if it gives the same result. To see if those set::tls and listen::tls-options make any difference. I don't really suspect this one, though, because you showed ./unrealircd reloadtls gave no significant memory increase.

For point 3: can you show me the output of /VERSION with all the library versions and such? OpenSSL, PCRE2, etc..

As for point 4: do you have any big .db files in unrealircd/data ? Like the tkl.db and such (perhaps even a history.db). Just the sizes.

I would like to solve this leak, although I don't think I would be willing to go as far as to go create docker, compile with musl and all that, etc. so hopefully it does not require that.

As for why giving out of memory after 330M (+/-).... I have little experience with docker but I can only agree it must be some imposed limit somewhere. But... how to check that and where that is imposed I don't know :).

syzop

2024-07-05 16:28

administrator   ~0023243

That's good hunting and analysis :)

> I feel the issue could be either with the OpenSSL or the use of musl...

Indeed

The combination of rehash+reloadtls is no longer needed indeed. The reloadtls part is already included in rehash. This is since UnrealIRCd 6 for sure, and I think it happened somewhere in 5..
If you didn't change the unrealircd config file then you can just "./unrealircd reloadtls" and it will re-read the cert and key etc. only, not do anything else.
If you did change unrealircd config, like you changed the listener block tls-options or set block tls and such, then yeah... you need to ./unrealircd rehash (and no need for an additional reloadtls).

So yeah from certbot hook and such reloadtls is perfectly fine (and quick, minimal). Outside of that, one normally uses rehash.

Of course that is just an extra explanation.

syzop

2024-07-05 16:35

administrator   ~0023244

Last edited: 2024-07-05 16:35

More speculation:

One thing that unrealircd is unusual about with regards to almost every other program out there is that we have loooooots of .so files. All the 200+ modules that we dynamically load and unload. On rehash it unloads almost all of them (except for a few which are "permanent") and then load them all. If there is some sort of issue in musl with cleaning up on library unload then this would quickly become very visible because we have like 200+ of these.

Unloading .so files is quite an unusual operation after all. And doing it so massively....

Or, it's something with OpenSSL. Which is a perfectly logical thing to suspect. BUT.. i have my doubts in this case because ./unrealircd reloadtls is not really affected much... i think that (nearly) proofs that it must be elsewhere but on the other hand the TLS specific code between rehash and reloadtls is not 100% the same (the rehash has extra 'checks') so... cannot rule it out either!

syzop

2024-07-05 17:33

administrator   ~0023246

Yeah, what I was saying is that you can remove the reloadtls, if you already rehash then the reloadtls will do nothing extra (everything that reloadtls does is already included in rehash as well) :)

Is there an easy way for you to try building with glibc instead of musl in the docker? I have no idea. But... i suspect that more than anything else.

Thanks for looking into it :)

anhtribao

2024-08-01 13:08

reporter   ~0023284

Hello

Some follow-up on the topic.

I have tried with the same config (alpine+musl) the following 2 cases:
- no remote include (but still use include of local files) => openssl and builtin grabber should not be used anymore
- no include at all (all the conf is inside a single file)
=> I still get the crash in both cases

As you said that each REHASH induces a unload of all mods and load of them, so I thought maybe the modules are unloaded but the memory they use isn't freed.
I am trying to map the increases of memory (4M-5M/REHASH) to them but it seems that it doesn't match the module size I have.
The size of the loaded modules (through a % du) is about 11M, so the double of the "leak", unless some are never unloaded or there is some shared memory.

Now my last step is to build as you suggested against glibc on Alpine, unfortunately I haven't found a way yet, or compile against musl on Debian (I have made some progress but the compilation fails because the musl-gcc can't find the openssl .h even if they are present on the system).

syzop

2024-08-01 18:40

administrator   ~0023285

As mentioned on IRC, apparently in musl dlclose() is a no-op, so it does not free any memory. https://wiki.musl-libc.org/functional-differences-from-glibc.html#Dynamic-linker

We should add a warning about this in the unrealircd docs somewhere.

anhtribao

2024-08-01 21:15

reporter   ~0023286

As I suppose the number of people using unrealircd with musl may be quite small and REHASHes are not usually often issued, is it necessary to mitigate that behaviour of musl inside unrealircd code?

If it were to be mitigated, I suppose an easy way to disable dynamic reloading of modules during REHASHes through a #ifdef, settable by the warned user with a variable during the ./Config? This way that user would choose either to lose dynamic reloading (hence having to restart the ircd to load/unload a module), or having a risk of crash after many rehashes, and this could be done by ./Config (or maybe in the unrealircd.conf).

syzop

2024-08-02 16:53

administrator   ~0023287

Yeah, you are right @ quite small... it's too marginal and too much of a maintenance burden. So will go for better docs/checks. It is very good to know about this issue, though, so thank you for all your time and effort you put into this :)

syzop

2024-08-02 16:54

administrator   ~0023288

(declassified, some notes marked as private that contained private data, attachments removed as they are no longer important)

Issue History

Date Modified Username Field Change
2024-07-04 17:34 anhtribao New Issue
2024-07-04 17:34 anhtribao File Added: Dockerfile
2024-07-04 17:34 anhtribao File Added: docker-irc-unreal_sandcat.service
2024-07-04 17:34 anhtribao File Added: ircd.log
2024-07-04 17:34 anhtribao File Added: Local.zip
2024-07-04 17:34 anhtribao File Added: Remotes.zip
2024-07-05 09:47 syzop Note Added: 0023234
2024-07-05 09:50 syzop Note Added: 0023235
2024-07-05 10:07 syzop Note Added: 0023236
2024-07-05 10:07 syzop Assigned To => syzop
2024-07-05 10:07 syzop Status new => feedback
2024-07-05 13:11 anhtribao Note Added: 0023239
2024-07-05 14:31 syzop Note Added: 0023240
2024-07-05 14:32 syzop Note Edited: 0023240
2024-07-05 14:32 syzop Note Edited: 0023240
2024-07-05 16:28 syzop Note Added: 0023243
2024-07-05 16:35 syzop Note Added: 0023244
2024-07-05 16:35 syzop Note Edited: 0023244
2024-07-05 17:33 syzop Note Added: 0023246
2024-08-01 13:08 anhtribao Note Added: 0023284
2024-08-01 18:40 syzop Note Added: 0023285
2024-08-01 18:42 syzop Status feedback => acknowledged
2024-08-01 18:42 syzop Summary Crash with out of memory after a lot of rehashes (config + TLS) => unrealircd w/musl = memory leak on REHASH (is by design @ musl)
2024-08-01 21:15 anhtribao Note Added: 0023286
2024-08-02 16:48 syzop File Deleted: Dockerfile
2024-08-02 16:48 syzop File Deleted: docker-irc-unreal_sandcat.service
2024-08-02 16:48 syzop File Deleted: ircd.log
2024-08-02 16:48 syzop File Deleted: Local.zip
2024-08-02 16:48 syzop File Deleted: Remotes.zip
2024-08-02 16:53 syzop Note Added: 0023287
2024-08-02 16:54 syzop View Status private => public
2024-08-02 16:54 syzop Note Added: 0023288