View Issue Details

IDProjectCategoryView StatusLast Update
0005229unrealircdpublic2019-03-11 18:48
ReporterGottemAssigned Tosyzop 
PrioritynormalSeveritymajorReproducibilityN/A
Status acknowledgedResolutionopen 
Product Version4.2.2 
Target VersionFixed in Version 
Summary0005229: Some issues when you have a ton of glines
Description1) As I mentioned on IRC before, if you have 10k+ glines and you do /stats G you're very likely to be killed with "Max SendQ exceeded". Perhaps sending it in batches would work, maybe a configurable like set::gline-batch-size or something similar. =] I've been thinking about a proper way to resume the dump (to prevent the race condition you mentioned) and the only "safe" way I could think of was to duplicate the aTKLine list, but that seems pretty damn wasteful. :D An easier way (I guess) would be simply to not check the sendQ for stats/gline output, although that might cause problems with processing all that data on the client side. I'd go for an explicit flag for the latter option, like an operpriv or something in a class block (รก la nofakelag).

2) Similarly, if a lot of these glines expire simultaneously the IRCd takes hours to walk through them all. This leads to a lot of timeshifting between IRCds, which in turn causes netsplits and potentially re-setting of previously removed glines (thereby causing a loop of sorts). We actually had 50k glines which started getting removed around 7 AM, but it was still going at 7 *PM* so I decided "fuck it I'll just stop all IRCds at once". :>

I set the severity to major because it causes an entirely unusable network. I also think there's a high probability of many other networks experiencing the same issues at some point, not to mention the spam waves seem to increase in "strength" as time goes by so the risk only goes up. :D
TagsNo tags attached.
3rd party modules

Activities

syzop

2019-03-11 18:11

administrator   ~0020544

Last edited: 2019-03-11 18:14

View 3 revisions

All but the /stats thing has been fixed a few days ago with the new system. These are the draft release notes for that change:
-
This release focuses on performance improvements, especially when the
server is under attack. UnrealIRCd now uses a technique that makes KLINE's,
GLINE's and (G)ZLINE's placed on individual IP's (*@IP) extremely fast.
Just to illustrate the performance improvements:
* Previously it took 129 seconds to add 100k ZLINE's, now it takes 2.5 secs.
* Checking a connection against 100,000 ZLINE's is now 250 times faster.
* Previously 7,500 clients could connect per minute, now 33,560 per minute.
* Even with 1 million ZLINE's on *@IP it can handle 30,000 connections p/m.
* Rejecting Z-lined users is even faster at 435,000 connections per minute
  with 100,000 active ZLINE's.
Benchmarked on a 2GHz Intel Xeon Skylake CPU with Linux 4.15.
-
So it's extremely fast. Not just with *@IP but also performance fixes with regards to adding, removing, slowness in connecting, expiring, etc. (not just with *@IP but with any sort of *LINES).
So I think you should be good with this new version :)

As for /stats.. I have considered the LIST-style approach but as I explained (and you repeat here) that is a bit difficult to keeping track. I also wonder who would really want to see 10k or even 100k of *LINES? We have a search system anyway, although it's probably underdocumented.
It is probably a whole lot easier to just bump the sendq in the config file. Some quick calculations show that even a 2M sendq should give you enough for 10k zlines most of the time (it depends on the length of the reason field and such). Personally I think 5M sendq for opers is just fine... after all I trust them enough to have a huge sendq. This makes me wonder what sendq you use... for 50k *LINES I would think that 10M should be sufficient in any case.
Anyway, to get back to the original question... I wonder who really wants to see the full list of 10k or 100k zlines. I mean.. you are not going to read 10,000 entries, right? So what is it for? Dumping to somewhere? Maybe a module would be more appropriate for that?

Anyway, in spite of recent attacks I could bump the defaults in the example conf... nowadays even the default server sendq of 5M seems a bit low.

syzop

2019-03-11 18:42

administrator   ~0020545

Last edited: 2019-03-11 18:47

View 2 revisions

As mentioned on IRC:
* Some command to display counts (totals)
* Document the searching options, like /stats G +r xyz. apparently there is +m for mask, +r for reason, and +s for the setter
* Fix and test the searching options, I don't think it works for zline ? Seems para is not passed.

18:44 <~Syzop[AWAY]> so this leaves the case of where someone does /zline and you have 10k zlines...
18:44 <~Syzop[AWAY]> maybe some kind of limit of output? like with WHO
18:44 <~Syzop[AWAY]> and displaying a notice regarding the search options (and count thingy) that will
                     be available?
18:44 <@Gottem> ya sounds fine by me

Issue History

Date Modified Username Field Change
2019-03-11 17:58 Gottem New Issue
2019-03-11 18:11 syzop Note Added: 0020544
2019-03-11 18:13 syzop Note Edited: 0020544 View Revisions
2019-03-11 18:14 syzop Note Edited: 0020544 View Revisions
2019-03-11 18:42 syzop Note Added: 0020545
2019-03-11 18:47 syzop Note Edited: 0020545 View Revisions
2019-03-11 18:48 syzop Assigned To => syzop
2019-03-11 18:48 syzop Status new => acknowledged