View Issue Details
|ID||Project||Category||View Status||Date Submitted||Last Update|
|0006363||unreal||ircd||public||2023-11-06 22:46||2023-11-25 14:53|
|Status||closed||Resolution||unable to duplicate|
|Summary||0006363: SASL request timed out|
|Description||"SASL request timed out (server or client misbehaving) -- aborting SASL and continuing connection..."|
Currently, my server is overcrowded, with 500 users connected to a single server, and about three-quarters of them are authenticated using SASL. There seems to be an issue sometimes, as it displays this error for no apparent reason. This error also occurs when I restart the IRC server; all users reconnect simultaneously, and this SASL blockage lasts for approximately 10 to 15 minutes, until there are enough users already connected. Is the problem located in my configuration, UnrealIRCd, or Anope?
This bug occurs consistently across all my clients and only with SASL.
|Tags||No tags attached.|
|3rd party modules|
SASL request timed out means this https://www.unrealircd.org/docs/Set_block#set::sasl-timeout (copy-paste):
The maximum time for SASL to take place. Time starts at the AUTHENTICATE command. The default is 15 seconds.
This protects against misbehaving or extremely laggy SASL servers (Services). Otherwise, a misbehaving server could lead to people no longer being able to connect.
On the UnrealIRCd-side, SASL is really easy, but on the Services-side it is likely a more intensive operation.
We talked about this before. In fact this is a duplicate of your earlier report https://bugs.unrealircd.org/view.php?id=6219 in which i wrote:
"I tried connecting 500 clones with SASL, disconnecting, reconnecting, it all works fine. Sure it takes maybe up to 30 seconds for all 500 clones to connect but that's mainly because i am in debugmode, but it works OK.
My best guess would be that your anope is too slow, goes to 100% CPU usage, and then the SASL times out. But.. only you can tell by running 'top' or something similar on the machine when this happens.
I don't see how anything would be wrong on our side (in UnrealIRCd), and my tests which were highly aggressive did not reproduce the issue. So I'm closing this because I don't think it is an issue in UnrealIRCd at the moment. Of course, can never be 100% sure but for now it certainly is looking that way. "
Did you look into that? At the time of the problem look at
1) if anope is using near 100% CPU at that time, as I wrote before... you run 'top' and look at the CPU usage during the problem, and
2) if not using high CPU, then if you are using mysql backend, look if your mysql queries are fast enough (eg 'mytop' and slow query log)
That 1) would be really easy to see by you.
I wasn't familiar with "mytop"; I've never used it before. It seems to work well. I've also enabled long query logs for queries that take longer than 10 seconds, but there are no logs at the moment. Additionally, this option was already enabled, but instead of 10 seconds, it was set to 50 seconds.
As for the server's overall CPU usage, I don't think it was at 100%. There is some room to spare. However, when it comes to MySQL, it can be resource-intensive at times. I've had my fair share of struggles with it. I'll run "mytop" as soon as SASL goes down. I even know a way to simulate this SASL issue, either now or around 6 PM. I can disconnect all web clients (there are 400 connected), wait for 30 seconds, and then reconnect them. This will likely cause a high number of simultaneous connections, potentially freezing SASL. That's when we should investigate using mytop, checking the logs, and monitoring the server's CPU usage. Normally, the server doesn't slow down even when there's a SASL problem.
" Ah also I forgot to say: It's the whole sasl service that is down, even with mIRC I can no longer identify myself. The only solution found is to restart Anope. "
On the other page, I mentioned that the only solution is to restart Anope, but I've already tried that many times, and the SASL still freezes, although not for long, maybe 5 - 10 minutes. I'm not sure if it affects all users or just a single user or my IP specifically. Sometimes, it feels like it's both. If Anope has been stopped and started again, and the SASL is still frozen, could that possibly mean it's an issue with UnrealIRCd?
Since which version of UnrealIRCd has the "set::sasl-timeout" option been available?
The first time I encountered this SASL bug fairly frequently, I'd say it's been about a year now. I've never seen this issue before.
In Webmin, I have this:
And all the others are below 23%. When you mention the CPU at 100%, are you referring to the server's CPU or the MySQL server's CPU?
query_cache_limit = 512Kb # no need to have it big. it will just eat all RAM with low efficiency
query_cache_size = 128M # of even 64M. Query cache gets really slow when it is bigger than 128Mb.
I've just added this to my.cnf because they were missing, and the CPU usage of MariaDB has increased to 59%. I hope it won't break anything, and I'll see if there are any changes during peak hours.
||Services issue, not UnrealIRCd. Closing.|
|2023-11-06 22:46||armyn||New Issue|
|2023-11-07 07:05||syzop||Note Added: 0023080|
|2023-11-07 07:06||syzop||Note Edited: 0023080|
|2023-11-07 16:20||armyn||Note Added: 0023081|
|2023-11-07 16:30||armyn||Note Added: 0023082|
|2023-11-07 16:58||armyn||Note Added: 0023083|
|2023-11-07 17:27||armyn||Note Added: 0023084|
|2023-11-25 14:53||syzop||Assigned To||=> syzop|
|2023-11-25 14:53||syzop||Status||new => closed|
|2023-11-25 14:53||syzop||Resolution||open => unable to duplicate|
|2023-11-25 14:53||syzop||Note Added: 0023105|