View Issue Details

IDProjectCategoryView StatusLast Update
0006484unrealircdpublic2025-10-05 16:24
Reportercraftxbox Assigned Tosyzop  
PrioritynormalSeverityminorReproducibilityalways
Status resolvedResolutionfixed 
Platformx86_64OSUbuntuOS VersionMixed
Product Version6.1.7.2 
Fixed in Version6.2.1 
Summary0006484: Bad performance when handling thousands of users being synced or lost in netsplit.
DescriptionI made a pseudo-server script that introduces 10,000 users to a network, and joins them all to 10 separate channels.
When the pseudo-server links, the introduction of the users works fine with no performance issues.
When joining them to a single channel, there are likewise no noticeable performance issues.
At 10 channels, there can be seen a 20 second delay between the pseudoserver's end of sync, and the sync acknowledgement from the remote server:

```
xmit: :999 EOS
xmit: NETINFO 0 1732596854 6100 * 0 0 0 :CRXB Industries
xmit: PING :test.dev.crxb.cc
[2024-11-26T04:54:25.550Z] Write buffer exhausted.
[2024-11-26T04:54:45.079Z] recv: :3W3 SLOG warn link LINK_UNRELIABLE :Warning, no response from par1.fr.crxb.cc for 15 seconds
[2024-11-26T04:54:45.079Z] recv:
[2024-11-26T04:54:46.010Z] recv: :3W3 SLOG info link SERVER_SYNCED :Link test.dev.crxb.cc -> hel1.fi.crxb.cc is now synced [secs: 31, recv: 3010525, sent: 18319]
[2024-11-26T04:54:46.010Z] recv: :hel1.fi.crxb.cc PONG hel1.fi.crxb.cc :test.dev.crxb.cc
```

This slowdown can also propagate across the network, causing servers to drop from ping-timeout:

```
[01:24:14] hel1.fi.crxb.cc link.SERVER_LINKED [info] Server linked: hel1.fi.crxb.cc -> test.dev.crxb.cc [secure: TLSv1.3-TLS_CHACHA20_POLY1305_SHA256]
[01:24:44] hel1.fi.crxb.cc link.LINK_UNRELIABLE [warn] Warning, no response from par1.fr.crxb.cc for 15 seconds
[01:24:45] hel1.fi.crxb.cc link.SERVER_SYNCED [info] Link test.dev.crxb.cc -> hel1.fi.crxb.cc is now synced [secs: 31, recv: 3010525, sent: 18319]
[01:25:29] hel1.fi.crxb.cc link.LINK_DISCONNECTED [error] Lost server link to par1.fr.crxb.cc [2001:bc8:710:3215:aaaa:dead:beef:cafe]: No response (Ping timeout)
[01:25:36] hel1.fi.crxb.cc link.LINK_RESOLVING [info] Resolving hostname par1.fr.crxb.cc...
[01:25:36] hel1.fi.crxb.cc link.LINK_CONNECTING [info] Trying to activate link with server par1.fr.crxb.cc (2001:bc8:710:3215:dc00:ff:fe3f:5a1:6900)...
[01:25:42] hel1.fi.crxb.cc link.SERVER_LINKED [info] Server linked: hel1.fi.crxb.cc -> par1.fr.crxb.cc [secure: TLSv1.3-TLS_CHACHA20_POLY1305_SHA256]
[01:25:42] hel1.fi.crxb.cc link.SERVER_SYNCED [info] Link par1.fr.crxb.cc -> hel1.fi.crxb.cc is now synced [secs: 0, recv: 14153, sent: 106354]
[01:25:42] hel1.fi.crxb.cc link.SERVER_LINKED_REMOTE [info] Server linked: tor1.ca.crxb.cc -> par1.fr.crxb.cc
[01:25:43] hel1.fi.crxb.cc link.SERVER_LINKED_REMOTE [info] Server linked: stj1.ca.crxb.cc -> tor1.ca.crxb.cc
[01:25:43] hel1.fi.crxb.cc link.SERVER_LINKED_REMOTE [info] Server linked: vrg1.us.crxb.cc -> tor1.ca.crxb.cc
[01:25:44] par1.fr.crxb.cc link.SERVER_SYNCED [info] Link tor1.ca.crxb.cc -> par1.fr.crxb.cc is now synced [secs: 1, recv: 16620, sent: 1856938]
[01:25:45] tor1.ca.crxb.cc link.SERVER_SYNCED [info] Link par1.fr.crxb.cc -> tor1.ca.crxb.cc is now synced [secs: 2, recv: 2103926, sent: 16852]
[01:26:59] hel1.fi.crxb.cc link.LINK_UNRELIABLE [warn] Warning, no response from par1.fr.crxb.cc for 15 seconds
[01:27:18] par1.fr.crxb.cc link.SERVER_SYNCED [info] Link hel1.fi.crxb.cc -> par1.fr.crxb.cc is now synced [secs: 96, recv: 3306511, sent: 24187]
```

The same effect occurs when the pseudoserver is disconnected aswell.
Steps To ReproduceI have included the Node script I used to perform this testing. You will have to change the details, obviously, but you can change authMethod to `pass` and provide `password` property if you do not feel like setting spkifp for it.
Additional InformationI was not performing this kind of stress testing on purpose, the original purpose for this amount of usercount was bridging a large (~6,000 member) discord server with puppet users on IRC. While I can't particularly test this in a live environment, I expect this could also occur from a particularly large network getting netsplit.

During the sync/disconnect process I can witness unrealircd pinning the entire CPU core it's running on on pretty much every server in the network.
My test network of 5 servers running on relatively lowish spec machines took 11 minutes to fully stabilize after a test run of 10k users to 50 channels.
TagsNo tags attached.
Attached Files
example.ts (2,625 bytes)
package.json (192 bytes)
3rd party modules

Activities

syzop

2025-02-16 08:55

administrator   ~0023431

Sorry I don't have time for this at the moment but plan to revisit this later in March/April as I surely want to optimize this :)

syzop

2025-10-03 19:22

administrator   ~0023518

Last edited: 2025-10-03 19:42

I have not tested this particular script but have been profiling for a week now, first with optimizing 1000 locally clients, that was the main focus. Today I have been testing with 10k and later 100k clones in 1 channel via a psuedo-server, where the psuedo-server is linked to server B, and then i let server A and B connect, so the 100k clones are introduced and joined all at once (at A). I have done massive performance improvements, cutting this 100k UID+SJOIN stuff down to only a few seconds. Things haven't been tested well yet but... something tells me performance should be much much better for your script also :D. And I'm not even finished yet, i only started today to work on the remote server case and syncing case.

syzop

2025-10-04 19:25

administrator   ~0023519

Ah I can fully reproduce your problem now with squit, will look into it :)

syzop

2025-10-05 15:48

administrator   ~0023520

Fixed, thanks for the report, and your patience.

commit af0a7844647277f32e75fd1ce0d371dcb9c75de4
Author: Bram Matthys <[email protected]>
Date: Sun Oct 5 08:24:14 2025 +0200

    Make member & membership point to each other so lookups can be much faster.
    This also makes them proper list items, again to make certain fast operations
    possible. Main thing is that removing an entry does not require us to walk
    all of those lists. Not all code has been modified yet to benefit this,
    actually only very little, the most performance-impacting ones.
    
    This fixes SQUIT of a server with 100k users in a single channel taking
    40 seconds of 100% CPU. It now takes only 1 second.
    Reported by craftxbox in https://bugs.unrealircd.org/view.php?id=6484
    
    (Can't make member & membership one entry atm, that would be too much change in U6)

syzop

2025-10-05 16:24

administrator   ~0023521

Oh and there have been various follow-up commits to make other things faster too :)

Issue History

Date Modified Username Field Change
2024-11-26 06:35 craftxbox New Issue
2024-11-26 06:35 craftxbox File Added: example.ts
2024-11-26 06:35 craftxbox File Added: package.json
2025-02-16 08:55 syzop Note Added: 0023431
2025-10-03 19:22 syzop Note Added: 0023518
2025-10-03 19:22 syzop Note Edited: 0023518
2025-10-03 19:24 syzop Note Edited: 0023518
2025-10-03 19:27 syzop Note Edited: 0023518
2025-10-03 19:42 syzop Note Edited: 0023518
2025-10-04 19:25 syzop Note Added: 0023519
2025-10-04 19:25 syzop Assigned To => syzop
2025-10-04 19:25 syzop Status new => confirmed
2025-10-05 15:48 syzop Status confirmed => resolved
2025-10-05 15:48 syzop Resolution open => fixed
2025-10-05 15:48 syzop Fixed in Version => 6.2.1
2025-10-05 15:48 syzop Note Added: 0023520
2025-10-05 16:24 syzop Note Added: 0023521