View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0006484 | unreal | ircd | public | 2024-11-26 06:35 | 2024-11-26 06:35 |
Reporter | craftxbox | Assigned To | |||
Priority | normal | Severity | minor | Reproducibility | always |
Status | new | Resolution | open | ||
Platform | x86_64 | OS | Ubuntu | OS Version | Mixed |
Product Version | 6.1.7.2 | ||||
Summary | 0006484: Bad performance when handling thousands of users being synced or lost in netsplit. | ||||
Description | I made a pseudo-server script that introduces 10,000 users to a network, and joins them all to 10 separate channels. When the pseudo-server links, the introduction of the users works fine with no performance issues. When joining them to a single channel, there are likewise no noticeable performance issues. At 10 channels, there can be seen a 20 second delay between the pseudoserver's end of sync, and the sync acknowledgement from the remote server: ``` xmit: :999 EOS xmit: NETINFO 0 1732596854 6100 * 0 0 0 :CRXB Industries xmit: PING :test.dev.crxb.cc [2024-11-26T04:54:25.550Z] Write buffer exhausted. [2024-11-26T04:54:45.079Z] recv: :3W3 SLOG warn link LINK_UNRELIABLE :Warning, no response from par1.fr.crxb.cc for 15 seconds [2024-11-26T04:54:45.079Z] recv: [2024-11-26T04:54:46.010Z] recv: :3W3 SLOG info link SERVER_SYNCED :Link test.dev.crxb.cc -> hel1.fi.crxb.cc is now synced [secs: 31, recv: 3010525, sent: 18319] [2024-11-26T04:54:46.010Z] recv: :hel1.fi.crxb.cc PONG hel1.fi.crxb.cc :test.dev.crxb.cc ``` This slowdown can also propagate across the network, causing servers to drop from ping-timeout: ``` [01:24:14] hel1.fi.crxb.cc link.SERVER_LINKED [info] Server linked: hel1.fi.crxb.cc -> test.dev.crxb.cc [secure: TLSv1.3-TLS_CHACHA20_POLY1305_SHA256] [01:24:44] hel1.fi.crxb.cc link.LINK_UNRELIABLE [warn] Warning, no response from par1.fr.crxb.cc for 15 seconds [01:24:45] hel1.fi.crxb.cc link.SERVER_SYNCED [info] Link test.dev.crxb.cc -> hel1.fi.crxb.cc is now synced [secs: 31, recv: 3010525, sent: 18319] [01:25:29] hel1.fi.crxb.cc link.LINK_DISCONNECTED [error] Lost server link to par1.fr.crxb.cc [2001:bc8:710:3215:aaaa:dead:beef:cafe]: No response (Ping timeout) [01:25:36] hel1.fi.crxb.cc link.LINK_RESOLVING [info] Resolving hostname par1.fr.crxb.cc... [01:25:36] hel1.fi.crxb.cc link.LINK_CONNECTING [info] Trying to activate link with server par1.fr.crxb.cc (2001:bc8:710:3215:dc00:ff:fe3f:5a1:6900)... [01:25:42] hel1.fi.crxb.cc link.SERVER_LINKED [info] Server linked: hel1.fi.crxb.cc -> par1.fr.crxb.cc [secure: TLSv1.3-TLS_CHACHA20_POLY1305_SHA256] [01:25:42] hel1.fi.crxb.cc link.SERVER_SYNCED [info] Link par1.fr.crxb.cc -> hel1.fi.crxb.cc is now synced [secs: 0, recv: 14153, sent: 106354] [01:25:42] hel1.fi.crxb.cc link.SERVER_LINKED_REMOTE [info] Server linked: tor1.ca.crxb.cc -> par1.fr.crxb.cc [01:25:43] hel1.fi.crxb.cc link.SERVER_LINKED_REMOTE [info] Server linked: stj1.ca.crxb.cc -> tor1.ca.crxb.cc [01:25:43] hel1.fi.crxb.cc link.SERVER_LINKED_REMOTE [info] Server linked: vrg1.us.crxb.cc -> tor1.ca.crxb.cc [01:25:44] par1.fr.crxb.cc link.SERVER_SYNCED [info] Link tor1.ca.crxb.cc -> par1.fr.crxb.cc is now synced [secs: 1, recv: 16620, sent: 1856938] [01:25:45] tor1.ca.crxb.cc link.SERVER_SYNCED [info] Link par1.fr.crxb.cc -> tor1.ca.crxb.cc is now synced [secs: 2, recv: 2103926, sent: 16852] [01:26:59] hel1.fi.crxb.cc link.LINK_UNRELIABLE [warn] Warning, no response from par1.fr.crxb.cc for 15 seconds [01:27:18] par1.fr.crxb.cc link.SERVER_SYNCED [info] Link hel1.fi.crxb.cc -> par1.fr.crxb.cc is now synced [secs: 96, recv: 3306511, sent: 24187] ``` The same effect occurs when the pseudoserver is disconnected aswell. | ||||
Steps To Reproduce | I have included the Node script I used to perform this testing. You will have to change the details, obviously, but you can change authMethod to `pass` and provide `password` property if you do not feel like setting spkifp for it. | ||||
Additional Information | I was not performing this kind of stress testing on purpose, the original purpose for this amount of usercount was bridging a large (~6,000 member) discord server with puppet users on IRC. While I can't particularly test this in a live environment, I expect this could also occur from a particularly large network getting netsplit. During the sync/disconnect process I can witness unrealircd pinning the entire CPU core it's running on on pretty much every server in the network. My test network of 5 servers running on relatively lowish spec machines took 11 minutes to fully stabilize after a test run of 10k users to 50 channels. | ||||
Tags | No tags attached. | ||||
Attached Files | |||||
3rd party modules | |||||