View Issue Details

IDProjectCategoryView StatusLast Update
0003972unrealircdpublic2019-08-05 06:37
ReporterMr_SmokeAssigned Tosyzop 
Status feedbackResolutionreopened 
Platformx86OSLinuxOS Version2.6.22
Product Version3.2.8 
Target VersionFixed in Version3.4-beta3 
Summary0003972: Autoconnect link does not reconnect after split
DescriptionSometimes (not always but nearly), when a connection between two hubs is lost, the link does not come back up even though it is set to autoconnect in the config file.

The ircd "sees" the split (ie "Lost connection to ..." notice and SQUIT in the logs), but remains on its own indefinitely (10 hours in the last instance until someone reconnected manually).

It seems that the problem builds up over time, ie. the first couple of disconnections after a restart/rehash are usually handled correctly (autotonnect kicks in), but later on, autoconnect just does not happen any longer.
Additional InformationThe only clue so far is:
[Wed Dec 1 00:19:13 2010] - WARNING: Time jumped 23 seconds ahead! (1291159130
-> 1291159153)
[Wed Dec 1 00:19:13 2010] - [TimeShift] Resetting some timers!

The server has ntpd running and has *not* skipped ahead itself.
TagsNo tags attached.
3rd party modulesdelaylist, regexcept, quarantine, nocodes



2010-12-10 20:55

administrator   ~0016447

I've found the mantis bug, it was caused by ~ followed by certain numbers.
I already reported it straight away on when I heard about this (better email next time ;p), but as you can see I had to trace it myself...

Anyway, the rest will be on-topic.
So far haven't looked at your issue.


2010-12-10 21:04

administrator   ~0016449

The jumps in time are really quite odd.. and big.

Could it be that your server clock is running too slow and ntp is adjusting it every once in a while in a huge leap? Actually ntpd shouldn't do that (except 1st tiem on boot sometimes), only ntpdate does things like that...


2010-12-10 21:06

administrator   ~0016450

also there was a bug resolved which involved autoconnect & time offsets, but that one probably does not apply to you:
- Fixed issue where a negative time offset (either caused by ircd.tune or
  timesynch) made autoconnect not work for the duration of the offset
  (eg: -60 would make autoconnect wait 60 seconds after boot, instead of
  autoconnecting almost immediately). Reported by aragon (0003853).

I mean.. it shouldn't apply, unless you have a negative offset of several days... or something ;)
See ircd.tune, first number, for the TS offset. Or use /TSCTL ALLTIME or something.


2010-12-10 21:54

reporter   ~0016451

I'm running a test ircd alongside one of the production ones so as to mimic the setup as closely as possible. The test daemon is a debug+Debug compile, and it has been patched according to issue 3853, as it was suggested that it might solve my issue. I'll let it run for a while and simulate splits, see what happens.

Also, I forgot to mention that the kernel has several GrSec features enabled, especially some of the /proc restrictions. Any chance it might be related ?


2010-12-11 07:30

reporter   ~0016453

@sysop : ntp resets time by about -0.15s every hour or so, which doesn't seem so bad but still might be improved by re-allowing /proc access and using adjtimex maybe. It never uses bigger leaps. ircd.tune has 0 916 in it.
TSCTL ALLTIME reports that all servers are in sync, except one at the moment running at -3. Never seen a TS offset in the days order of magnitude.


2010-12-11 12:26

administrator   ~0016454

Then I don't think 0003853 is related to this.
I also don't see how grsecurity or any /proc restrictions could have any affect.

The tech info regarding detecting time jumps is:
1) it saves the current time
2) the ircd does what it needs to.. I mean.. accepting connections, reading, writing, etc... the main loop
3) it compares the current time with the saved time
If, like in this case, 23 seconds have passed between 1 and 3, then we assume the clock was adjusted.
It MAY have a different cause, like if you have anything else stalling the ircd for 23 seconds, like a very very heavy spamfilter or you are attaching a debugger and doing things and then 'c' continuing. But.. it is very unlikely that you wouldn't have noticed that if you & others use that server ;)


2010-12-11 12:28

administrator   ~0016455

Is it relatively easy to reproduce? Like if I have your config, modify the server names of course, then /squit or tcpkill the connection a few times and then it stops autoconnecting?
If you can give a test case - like something I can reproduce here - that would be one way to trace it.


2010-12-11 12:29

administrator   ~0016456

Bleh, should have put this all in one comment ;)

Another important question to find out is: is this problem specific to this server?


2010-12-11 17:44

reporter   ~0016457

It's not easy to reproduce. I've set up the test/debug server for that reason. It lives on the same physical host where the failing ircd is, so I hope that it will suffer the same symptom sooner or later, but there's no guarantee. It doesn't seem to do it after a few disconnects, but only after a certain amount of time.

Also, it seems to happen more frequently if the split is caused by external factors rather than SQUIT.

The issue is not specific to this server, we have seen this behaviour on two different hubs.

Regarding the 23 seconds time shift, I have no idea what might have happened :S


2010-12-26 20:52

reporter   ~0016513

same problem here.

My server is connected in irc network using ssl, it's running Debian Squeeze and time is synchronized with level 2 ntp server.

There is many timeshifts every day (from 20s to 50s). And after netsplit, I have to make rehash, otherwise it wont connect.


2010-12-26 23:46

reporter   ~0016514

This may seem simple but what is the value for connfreq in your server's class block?


2010-12-27 02:39

reporter   ~0016515

Last edited: 2010-12-27 02:42

View 3 revisions

Hello katsklaw,
my connfreq is set to 30, but it doesn't want to connect even after an hour :). But only in the case of netsplit - after restart, etc it's ok.

And the timeshift problem it kinda unpleasant too, because all clients are freezed during the timejump :(.


2010-12-27 11:09

administrator   ~0016516

Last edited: 2012-08-23 19:23

View 2 revisions

Regarding time jumps, you could run a bash script for a day (or at least to catch one or two jumps) like:

while true
date >>date.log
sleep 1

And see if UnrealIRCd is lying or if there are really jumps ;)
Simply look back at date.log whenever unreal says the time jumped.
date.log should show 1 (and sometimes 2) second increments of time, anything big like >5 means the box itself has some kind of clock problem (or major performance issues).
IIRC I coded it that during a jump all client timers are reset so clients shouldn't notice any 'lag', however the freeze might be caused by the problem with the machine itself (I mean, it being a primary symptom caused by the machine itself, rather than a symptom caused by unreal).

Well, if you can, run the script, and see if you get any interesting results ;)


2010-12-30 21:03

administrator   ~0016545

I've experienced the same issue I think, but I only have log files, I was not online so cannot see if it did any (failed) linking attempts for sure.

Anyway, log:
[Thu Dec 30 10:46:10 2010] - TSCTL: U:line security.xxxx set time to be 1293727573 (timediff: 392, was 391)
[Thu Dec 30 11:03:02 2010] - Lost connection to huba.xxxx[74.xxxx]:Operation timed out
[Thu Dec 30 11:03:02 2010] - SQUIT huba.xxxxx (Operation timed out)
[Thu Dec 30 12:39:28 2010] - OPER (Syzop) by (Syzop!~none@xxxxx)
[Thu Dec 30 12:39:37 2010] - SERVER hubb.xxxx

This server has 2 server classes, one with connfreq 27 and one with connfreq 90.
huba is in the 90s one, and hubb in the 27s one.

It seems neither autoconnect was triggered. When I went online this server was not linked to any server (/MAP).
As soon as I opered up and did a '/connect hubb*' it linked instantly.

Seems we got a real issue here.

Strange thing is, I don't recall having seen it before.

Unreal version:

I guess I'll have to go over the autoconnect code (haven't really looked or touched that ever before), and play a bit with tcpkill and such.


2011-01-23 20:33

reporter   ~0016587

Just to add to the already existing weirdness ...

We've had maybe 4 or 5 splits recently. The last one displayed the bug, but the 3 or 4 before that went well, ie. autoconnect kicked in. So it's really quite random ... I'll check the dates, see if anything comes up from that


2015-07-24 12:00

administrator   ~0018562

Quite some code changed in this area, and I also did this (which in itself does not fix this issue, but would fix other cases I guess):

commit 8bea34626398cb10217e5f097456f113d22dae3b
Author: Bram Matthys <>
Date: Fri Jul 24 11:57:09 2015 +0200

    In fix_timers(), called for time jumps, also deal with aconf->hold (outgoing autoconnects). Change throttling flush to a more conservative approach.


2018-05-04 11:19

reporter   ~0020129

I hate to be the bearer of bad news, but this issue might still be current.

We've got a network (different from the one that caused the initial bug report btw) running 4.0.17 (upgrade in progress). This morning we suffered a couple of splits. The leaf running 4.0.17 reconnected in about 10 seconds. Another running 4.0.14 stayed disconnected until I manually issued a CONNECT.

Aside from the obvious move to upgrade to 4.0.17, what other tools could be used to help diagnose this issue? Certainly 0003973 would help, but I'm open to other suggestions as well.


2018-05-04 11:24

reporter   ~0020130

I also need to add that another link between two 4.0.14 hubs had no trouble coming back up automatically, so it's still rather random… which makes it all the more difficult to diagnose without some insight as to whether reconnecting is even attempted.


2018-07-14 16:44

administrator   ~0020202

Yes, must be rather annoying, especially with the random aspect (which is also annoying for me :D). I'll implement 0003973 in 4.0.19 and hopefully it will give you more information that allows you to help this bug report as well.


2018-09-09 17:06

administrator   ~0020303

As promised, 0003973 has been implemented in 4.0.19. Should help a lot to trace this issue.


2019-02-01 14:34

administrator   ~0020474

Have you seen this bug happen since 4.0.19 (or later)?


2019-08-05 06:37

reporter   ~0020797

I confirm the issue.

* *** Connection to (hidden)[(hidden)] activated.
* Lost connection to (hidden)[(hidden)]: SSL_connect(): Internal OpenSSL error or protocol error: ssl_undefined_function: called a function you should not call. Please verify that listen::options::ssl is enabled on port 7029 in (hidden)'s configuration file.

But immediately after rehash (no config change)...

* *** Loading IRCd configuration..
* *** Configuration loaded without any problems.
* *** Connection to (hidden)[(hidden)] activated.
* *** (link) Secure link (hidden) -> (hidden)[@(hidden)] established (TLSv1.2-ECDHE-RSA-AES256-GCM-SHA384)

Issue History

Date Modified Username Field Change
2010-12-01 16:25 Mr_Smoke New Issue
2010-12-02 04:02 ohnobinki Additional Information Updated View Revisions
2010-12-02 04:03 ohnobinki Additional Information Updated View Revisions
2010-12-10 20:48 syzop Additional Information Updated View Revisions
2010-12-10 20:48 syzop Additional Information Updated View Revisions
2010-12-10 20:48 syzop Additional Information Updated View Revisions
2010-12-10 20:55 syzop Note Added: 0016447
2010-12-10 21:03 syzop Additional Information Updated View Revisions
2010-12-10 21:04 syzop Note Added: 0016449
2010-12-10 21:06 syzop Note Added: 0016450
2010-12-10 21:54 Mr_Smoke Note Added: 0016451
2010-12-11 07:30 Mr_Smoke Note Added: 0016453
2010-12-11 12:26 syzop Note Added: 0016454
2010-12-11 12:28 syzop Note Added: 0016455
2010-12-11 12:29 syzop Note Added: 0016456
2010-12-11 17:44 Mr_Smoke Note Added: 0016457
2010-12-26 20:52 karel Note Added: 0016513
2010-12-26 23:46 katsklaw Note Added: 0016514
2010-12-27 02:39 karel Note Added: 0016515
2010-12-27 02:39 karel Note Edited: 0016515 View Revisions
2010-12-27 02:42 karel Note Edited: 0016515 View Revisions
2010-12-27 11:09 syzop Note Added: 0016516
2010-12-30 21:03 syzop Note Added: 0016545
2011-01-23 20:33 Mr_Smoke Note Added: 0016587
2011-06-05 21:37 syzop Relationship added child of 0003776
2011-07-17 23:24 syzop Relationship deleted child of 0003776
2011-07-17 23:24 syzop Relationship added child of 0003915
2012-08-23 19:23 syzop Note Edited: 0016516 View Revisions
2012-10-06 12:22 syzop Relationship deleted child of 0003915
2015-07-24 12:00 syzop Note Added: 0018562
2015-07-24 12:00 syzop Status new => resolved
2015-07-24 12:00 syzop Fixed in Version => 3.4-beta3
2015-07-24 12:00 syzop Resolution open => fixed
2015-07-24 12:00 syzop Assigned To => syzop
2018-05-04 11:19 Mr_Smoke Status resolved => feedback
2018-05-04 11:19 Mr_Smoke Resolution fixed => reopened
2018-05-04 11:19 Mr_Smoke Note Added: 0020129
2018-05-04 11:24 Mr_Smoke Note Added: 0020130
2018-07-14 16:44 syzop Note Added: 0020202
2018-09-09 17:06 syzop Note Added: 0020303
2019-02-01 14:34 syzop Note Added: 0020474
2019-08-05 06:37 k4be Note Added: 0020797