View Issue Details

IDProjectCategoryView StatusLast Update
0004948unrealircdpublic2017-05-10 17:05
ReporterEman Assigned Tosyzop  
PrioritynormalSeverityblockReproducibilityhave not tried
Status resolvedResolutionopen 
Product Version4.0.9 
Summary0004948: large number of connections causes port to stop working
Descriptionafter a large number of server connection attempts (every 30 seconds for at least 48 hours), unreal gives the following errors at every subsequent server connect attempt:

[2017/05/10 - 9:48:10AM] -coruscant.r-type.ca- *** No more connections allowed (yavin.r-type.us)
[2017/05/10 - 9:48:10AM] -coruscant.r-type.ca- Connect to host yavin.r-type.us[172.16.2.14] failed: Success
[2017/05/10 - 9:48:31AM] -coruscant.r-type.ca- *** All connections in use. ([@*/6697])
[2017/05/10 - 9:48:40AM] -coruscant.r-type.ca- *** No more connections allowed (yavin.r-type.us)
[2017/05/10 - 9:48:40AM] -coruscant.r-type.ca- Connect to host yavin.r-type.us[172.16.2.14] failed: Success
[2017/05/10 - 9:49:01AM] -coruscant.r-type.ca- *** All connections in use. ([@*/6697])

this causes the port to be entirely unresponsive for even user connections. the only fix appears to be restarting the ircd.
Additional Informationfor reference, this is the server output when it 'switched' to spitting out the no more connections:

[2017/05/09 - 10:12:10PM] -coruscant.r-type.ca- Exiting ssl client yavin.r-type.us[@172.16.2.14.0]: SSL_connect(): Connection refused
[2017/05/09 - 10:12:10PM] -coruscant.r-type.ca- Lost connection to yavin.r-type.us[172.16.2.14]: SSL_connect(): Connection refused
[2017/05/09 - 10:12:40PM] -coruscant.r-type.ca- *** Connection to yavin.r-type.us[172.16.2.14] activated.
[2017/05/09 - 10:12:40PM] -coruscant.r-type.ca- Exiting ssl client yavin.r-type.us[@172.16.2.14.0]: SSL_connect(): Connection refused
[2017/05/09 - 10:12:40PM] -coruscant.r-type.ca- Lost connection to yavin.r-type.us[172.16.2.14]: SSL_connect(): Connection refused
[2017/05/09 - 10:13:10PM] -coruscant.r-type.ca- *** No more connections allowed (yavin.r-type.us)
[2017/05/09 - 10:13:11PM] -coruscant.r-type.ca- Connect to host yavin.r-type.us[172.16.2.14] failed: Success
[2017/05/09 - 10:13:36PM] -coruscant.r-type.ca- *** All connections in use. ([@*/6697])
[2017/05/09 - 10:13:40PM] -coruscant.r-type.ca- *** No more connections allowed (yavin.r-type.us)
[2017/05/09 - 10:13:40PM] -coruscant.r-type.ca- Connect to host yavin.r-type.us[172.16.2.14] failed: Success
[2017/05/09 - 10:14:10PM] -coruscant.r-type.ca- *** No more connections allowed (yavin.r-type.us)
[2017/05/09 - 10:14:10PM] -coruscant.r-type.ca- Connect to host yavin.r-type.us[172.16.2.14] failed: Success
[2017/05/09 - 10:14:40PM] -coruscant.r-type.ca- *** No more connections allowed (yavin.r-type.us)
[2017/05/09 - 10:14:40PM] -coruscant.r-type.ca- Connect to host yavin.r-type.us[172.16.2.14] failed: Success
[2017/05/09 - 10:15:10PM] -coruscant.r-type.ca- *** No more connections allowed (yavin.r-type.us)
TagsNo tags attached.
3rd party modules

Activities

Eman

2017-05-10 15:56

reporter   ~0019759

running /close does nothing to help this:

[2017/05/10 - 9:55:55AM] -> Server: close
[2017/05/10 - 9:55:56AM] 0: Connections Closed
[2017/05/10 - 9:55:56AM] -coruscant.r-type.ca- *** [email protected] closed 0 unknown connections

syzop

2017-05-10 16:03

administrator   ~0019760

Which Operating System? (Strange, I thought that was a mandatory field :D)

Eman

2017-05-10 16:04

reporter   ~0019762

centos 6.9

syzop

2017-05-10 16:46

administrator   ~0019763

If it happens again, could you run an: lsof -n -p PID-OF-IRCD-PROCESS
Or if you don't have lsof, you can do: ls -al /proc/PID-OF-IRCD-PROCESS/fd

Naturally, replace PID-OF-IRCd-PROCESS with an actual number :)

What you describe ("all connections in use") is that all file descriptors are in use. If you think this is incorrect, for example because you see 800 clients and you have compiled with a MAXCONNECTIONS of 1024 then it is probably "leaking" file descriptors. These will show up in the lsof / ls from above.

syzop

2017-05-10 16:50

administrator   ~0019764

Last edited: 2017-05-10 16:52

Ah, just saw your strace (downloaded and deleted the attachment, for some reason attachments cannot be private so it was public).

Strange, it sends "All connections in use" to FD 13. Such a low FD number during such an error is a bit unexpected, though theoretically possible. In fact all the FD numbers I saw in your trace are low, as in <40.

You probably restarted by now, but.. the lsof or ls from above would still be useful when you hit the problem OR you can run it before that if it starts leaking fd's gradually.. then they will show up already after a while and no need to wait until you hit 1024.

**

I could try to reproduce it by the method you describe - server connection attempts - it may or may not be related.

**

Any 3rd party modules loaded ?

syzop

2017-05-10 16:58

administrator   ~0019765

Nevermind tracing. I can reproduce the problem here.

It's not a FD leak, but an incorrect counter.

syzop

2017-05-10 17:05

administrator   ~0019767

** public **

https://github.com/unrealircd/unrealircd/commit/a6f5460ad819902920146c37a04bfbc7051de9ed

commit a6f5460ad819902920146c37a04bfbc7051de9ed
Author: Bram Matthys <[email protected]>
Date: Wed May 10 17:03:45 2017 +0200

    Update OpenFiles upon failed SSL connect to remote server. Reported by Eman (0004948).

Issue History

Date Modified Username Field Change
2017-05-10 15:55 Eman New Issue
2017-05-10 15:56 Eman Note Added: 0019759
2017-05-10 16:03 syzop Note Added: 0019760
2017-05-10 16:04 Eman File Added: strace.txt
2017-05-10 16:04 Eman Note Added: 0019762
2017-05-10 16:46 syzop Note Added: 0019763
2017-05-10 16:47 syzop File Deleted: strace.txt
2017-05-10 16:50 syzop Note Added: 0019764
2017-05-10 16:52 syzop Note Edited: 0019764
2017-05-10 16:58 syzop Assigned To => syzop
2017-05-10 16:58 syzop Status new => confirmed
2017-05-10 16:58 syzop Note Added: 0019765
2017-05-10 16:59 syzop Severity minor => block
2017-05-10 16:59 syzop View Status public => private
2017-05-10 17:05 syzop Status confirmed => resolved
2017-05-10 17:05 syzop View Status private => public
2017-05-10 17:05 syzop Note Added: 0019767