View Issue Details

IDProjectCategoryView StatusLast Update
0006484unrealircdpublic2024-11-26 06:35
Reportercraftxbox Assigned To 
PrioritynormalSeverityminorReproducibilityalways
Status newResolutionopen 
Platformx86_64OSUbuntuOS VersionMixed
Product Version6.1.7.2 
Summary0006484: Bad performance when handling thousands of users being synced or lost in netsplit.
DescriptionI made a pseudo-server script that introduces 10,000 users to a network, and joins them all to 10 separate channels.
When the pseudo-server links, the introduction of the users works fine with no performance issues.
When joining them to a single channel, there are likewise no noticeable performance issues.
At 10 channels, there can be seen a 20 second delay between the pseudoserver's end of sync, and the sync acknowledgement from the remote server:

```
xmit: :999 EOS
xmit: NETINFO 0 1732596854 6100 * 0 0 0 :CRXB Industries
xmit: PING :test.dev.crxb.cc
[2024-11-26T04:54:25.550Z] Write buffer exhausted.
[2024-11-26T04:54:45.079Z] recv: :3W3 SLOG warn link LINK_UNRELIABLE :Warning, no response from par1.fr.crxb.cc for 15 seconds
[2024-11-26T04:54:45.079Z] recv:
[2024-11-26T04:54:46.010Z] recv: :3W3 SLOG info link SERVER_SYNCED :Link test.dev.crxb.cc -> hel1.fi.crxb.cc is now synced [secs: 31, recv: 3010525, sent: 18319]
[2024-11-26T04:54:46.010Z] recv: :hel1.fi.crxb.cc PONG hel1.fi.crxb.cc :test.dev.crxb.cc
```

This slowdown can also propagate across the network, causing servers to drop from ping-timeout:

```
[01:24:14] hel1.fi.crxb.cc link.SERVER_LINKED [info] Server linked: hel1.fi.crxb.cc -> test.dev.crxb.cc [secure: TLSv1.3-TLS_CHACHA20_POLY1305_SHA256]
[01:24:44] hel1.fi.crxb.cc link.LINK_UNRELIABLE [warn] Warning, no response from par1.fr.crxb.cc for 15 seconds
[01:24:45] hel1.fi.crxb.cc link.SERVER_SYNCED [info] Link test.dev.crxb.cc -> hel1.fi.crxb.cc is now synced [secs: 31, recv: 3010525, sent: 18319]
[01:25:29] hel1.fi.crxb.cc link.LINK_DISCONNECTED [error] Lost server link to par1.fr.crxb.cc [2001:bc8:710:3215:aaaa:dead:beef:cafe]: No response (Ping timeout)
[01:25:36] hel1.fi.crxb.cc link.LINK_RESOLVING [info] Resolving hostname par1.fr.crxb.cc...
[01:25:36] hel1.fi.crxb.cc link.LINK_CONNECTING [info] Trying to activate link with server par1.fr.crxb.cc (2001:bc8:710:3215:dc00:ff:fe3f:5a1:6900)...
[01:25:42] hel1.fi.crxb.cc link.SERVER_LINKED [info] Server linked: hel1.fi.crxb.cc -> par1.fr.crxb.cc [secure: TLSv1.3-TLS_CHACHA20_POLY1305_SHA256]
[01:25:42] hel1.fi.crxb.cc link.SERVER_SYNCED [info] Link par1.fr.crxb.cc -> hel1.fi.crxb.cc is now synced [secs: 0, recv: 14153, sent: 106354]
[01:25:42] hel1.fi.crxb.cc link.SERVER_LINKED_REMOTE [info] Server linked: tor1.ca.crxb.cc -> par1.fr.crxb.cc
[01:25:43] hel1.fi.crxb.cc link.SERVER_LINKED_REMOTE [info] Server linked: stj1.ca.crxb.cc -> tor1.ca.crxb.cc
[01:25:43] hel1.fi.crxb.cc link.SERVER_LINKED_REMOTE [info] Server linked: vrg1.us.crxb.cc -> tor1.ca.crxb.cc
[01:25:44] par1.fr.crxb.cc link.SERVER_SYNCED [info] Link tor1.ca.crxb.cc -> par1.fr.crxb.cc is now synced [secs: 1, recv: 16620, sent: 1856938]
[01:25:45] tor1.ca.crxb.cc link.SERVER_SYNCED [info] Link par1.fr.crxb.cc -> tor1.ca.crxb.cc is now synced [secs: 2, recv: 2103926, sent: 16852]
[01:26:59] hel1.fi.crxb.cc link.LINK_UNRELIABLE [warn] Warning, no response from par1.fr.crxb.cc for 15 seconds
[01:27:18] par1.fr.crxb.cc link.SERVER_SYNCED [info] Link hel1.fi.crxb.cc -> par1.fr.crxb.cc is now synced [secs: 96, recv: 3306511, sent: 24187]
```

The same effect occurs when the pseudoserver is disconnected aswell.
Steps To ReproduceI have included the Node script I used to perform this testing. You will have to change the details, obviously, but you can change authMethod to `pass` and provide `password` property if you do not feel like setting spkifp for it.
Additional InformationI was not performing this kind of stress testing on purpose, the original purpose for this amount of usercount was bridging a large (~6,000 member) discord server with puppet users on IRC. While I can't particularly test this in a live environment, I expect this could also occur from a particularly large network getting netsplit.

During the sync/disconnect process I can witness unrealircd pinning the entire CPU core it's running on on pretty much every server in the network.
My test network of 5 servers running on relatively lowish spec machines took 11 minutes to fully stabilize after a test run of 10k users to 50 channels.
TagsNo tags attached.
Attached Files
example.ts (2,625 bytes)
package.json (192 bytes)
3rd party modules

Activities

Issue History

Date Modified Username Field Change
2024-11-26 06:35 craftxbox New Issue
2024-11-26 06:35 craftxbox File Added: example.ts
2024-11-26 06:35 craftxbox File Added: package.json