View Issue Details

IDProjectCategoryView StatusLast Update
0002836unrealircdpublic2006-06-05 13:12
Reportermixx941 Assigned Tosyzop  
PrioritynormalSeveritycrashReproducibilityrandom
Status resolvedResolutionfixed 
Platformi386OSFreeBSDOS Version5.3-RELEASE
Product Version3.2.4 
Fixed in Version3.2.5 
Summary0002836: Not Relinking Properly, Then a Crash
DescriptionI was relinking all our servers tonight because scheduled maintenance for one of the hubs was coming up. My procedure was:

1) Edit all the configs to comment out autoconnect from the hub that was having the maintenance and uncomment autoconnect to the backup hub.

2) /rehash all the IRCD's to make sure I didn't make a mistake in the config

3) On the leaf, /squit the hub that was going down

4) /rehash again to initiate the autoconnect sequence to the backup hub with the autoconnect on.

For some servers, this went as planned. For a few, even though I commented out autoconnect for the hub that I'm trying to move them from, it kept autoconnecting to that commented out hub. I tried it again, and it autoconnected to the commented out hub AGAIN. All the while it said it reloaded the configuration file fine when I /rehashed, so my changes should have been there. I then tried squitting and manually /connecting it to where I wanted it, but it reconnected to the commented out hub once again.

At this point, one the ircd's actually crashed with a core file which I have included below.

So now it's been almost 10 minutes of constantly restarting/splitting and I just lost users due to that crash. I commented out the entire link block for that hub, rehash, squit, and then it seemed to work.

The above behavior of not linking to the right hub despite things being commented out properly and autoconnect enabled on the hub I wanted it to go to happened on 3 servers. Only one of them crashed though.

I had no problems doing the same steps on 3.2.3 when maintenance came up before.

Thanks in advance.

-Mark
Additional InformationBACKTRACE:

(gdb) bt
#0 0x08062f90 in match (mask=0x6f6c6720 <Address 0x6f6c6720 out of bounds>, name=0x6f6c6720 <Address 0x6f6c6720 out of bounds>) at match.c:411
#1 0x283286a1 in ?? ()
#2 0x6f6c6720 in ?? ()
#3 0x0818f8ea in ?? ()
#4 0x0000006f in ?? ()
#5 0x00000007 in ?? ()
#6 0x00006001 in ?? ()
#7 0x0000000d in ?? ()
0000008 0xbfbfe788 in ?? ()
#9 0x0808192d in sendto_one (to=0x818f800, pattern=0x81ecc00 "") at send.c:237
#10 0x28329c2a in ?? ()
#11 0x0818f800 in ?? ()
#12 0x081ecc00 in ?? ()
#13 0x00000005 in ?? ()
#14 0x08118ea0 in nsprefix ()
#15 0x4400046d in ?? ()
#16 0x00000001 in ?? ()
0000017 0x0817a200 in ?? ()
#18 0x4400046d in ?? ()
#19 0x00000006 in ?? ()
0000020 0x00000000 in ?? ()
#21 0xbfbfe878 in ?? ()
#22 0x28275c2f in ldexp () from /lib/libc.so.5
#23 0x080663ed in parse (cptr=0x818f800, buffer=0x818f8e4 "@2j '", bufend=0x6f6c6720 <Address 0x6f6c6720 out of bounds>) at parse.c:440
#24 0x0806549a in dopacket (cptr=0x818f800, buffer=0x80aaee0 "\004)\uffffI\036C\004\035\uffff\210\222j\uffff\0254\uffffmisss\022F\uffff\034\214\uffff\uffff\uffff\uffff\215IkK\003\004", length=1)
    at packet.c:138
#25 0x08056f88 in read_message (delay=1, listp=0x8149900) at s_bsd.c:1477
0000026 0x08061cfb in main (argc=135549216, argv=0xbfbfeca8) at ircd.c:1564
TagsNo tags attached.
3rd party modulesm_ircops, nocolorumode, m_privdeaf

Activities

syzop

2006-02-25 05:55

administrator   ~0011296

Seems some important stuff is missing from the backtrace.
Could you run './unreal backtrace' on the commandline and paste results back to us?
Thanks.

mixx941

2006-02-25 13:59

reporter   ~0011297

Last edited: 2006-02-25 14:04

Sure:

=================== START HERE ======================
BACKTRACE:
Core was generated by `ircd'.
Program terminated with signal 11, Segmentation fault.
Error while mapping shared library sections:
tmp/48AF02B6.commands.so: No such file or directory.
Error while mapping shared library sections:
tmp/41967233.cloak.so: No such file or directory.
Error while mapping shared library sections:
tmp/4D662653.m_privdeaf.so: No such file or directory.
Error while mapping shared library sections:
tmp/7DDB9159.nocolorumode.so: No such file or directory.
Error while mapping shared library sections:
tmp/24371780.m_ircops.so: No such file or directory.
Error while reading shared library symbols:
tmp/48AF02B6.commands.so: No such file or directory.
Error while reading shared library symbols:
tmp/41967233.cloak.so: No such file or directory.
Error while reading shared library symbols:
tmp/4D662653.m_privdeaf.so: No such file or directory.
Error while reading shared library symbols:
tmp/7DDB9159.nocolorumode.so: No such file or directory.
Error while reading shared library symbols:
tmp/24371780.m_ircops.so: No such file or directory.
#0 0x08062f90 in match (mask=0x6f6c6720 <Address 0x6f6c6720 out of bounds>, name=0x6f6c6720 <Address 0x6f6c6720 out of bounds>) at match.c:411
411 if (mask[0] == '*' && mask[1] == '!') {
#0 0x08062f90 in match (mask=0x6f6c6720 <Address 0x6f6c6720 out of bounds>, name=0x6f6c6720 <Address 0x6f6c6720 out of bounds>) at match.c:411
#1 0x283286a1 in ?? ()
#2 0x6f6c6720 in ?? ()
#3 0x0818f8ea in ?? ()
#4 0x0000006f in ?? ()
#5 0x00000007 in ?? ()
#6 0x00006001 in ?? ()
#7 0x0000000d in ?? ()
0000008 0xbfbfe788 in ?? ()
#9 0x0808192d in sendto_one (to=0x818f800, pattern=0x81ecc00 "") at send.c:237
#10 0x28329c2a in ?? ()
#11 0x0818f800 in ?? ()
#12 0x081ecc00 in ?? ()
#13 0x00000005 in ?? ()
#14 0x08118ea0 in nsprefix ()
#15 0x4400046d in ?? ()
#16 0x00000001 in ?? ()
0000017 0x0817a200 in ?? ()
#18 0x4400046d in ?? ()
#19 0x00000006 in ?? ()
0000020 0x00000000 in ?? ()
#21 0xbfbfe878 in ?? ()
#22 0x28275c2f in ldexp () from /lib/libc.so.5
#23 0x080663ed in parse (cptr=0x818f800, buffer=0x818f8e4 "@2j '", bufend=0x6f6c6720 <Address 0x6f6c6720 out of bounds>) at parse.c:440
#24 0x0806549a in dopacket (cptr=0x818f800, buffer=0x80aaee0 "\004)\uffffI\036C\004\035\uffff\210\222j\uffff\0254\uffffmisss\022F\uffff\034\214\uffff\uffff\uffff\uffff\215IkK\003\004", length=1)
    at packet.c:138
#25 0x08056f88 in read_message (delay=1, listp=0x8149900) at s_bsd.c:1477
0000026 0x08061cfb in main (argc=135549216, argv=0xbfbfeca8) at ircd.c:1564

#0 0x08062f90 in match (mask=0x6f6c6720 <Address 0x6f6c6720 out of bounds>, name=0x6f6c6720 <Address 0x6f6c6720 out of bounds>) at match.c:411
411 if (mask[0] == '*' && mask[1] == '!') {

0x814b700 <backupbuf>: "@2j ' one.of.my.leaf.servers 3 168 :Mynet Norway - Hosted by website.org"

#0 0x08062f90 in match (mask=0x6f6c6720 <Address 0x6f6c6720 out of bounds>, name=0x6f6c6720 <Address 0x6f6c6720 out of bounds>) at match.c:411
No locals.
#1 0x283286a1 in ?? ()
No symbol table info available.
#2 0x6f6c6720 in ?? ()
No symbol table info available.
GCC: gcc version 3.4.2 [FreeBSD] 20040728
UNAME: FreeBSD mixxnet 5.3-RELEASE FreeBSD 5.3-RELEASE #0: Sat Dec 11 01:37:19 CET 2004 xacto@byggarebob:/usr/src/sys/i386/compile/MYKERNEL i386
UNREAL: Unreal3.2.4 build 1.1.1.1.2.22 2006/02/05 18:03:15
CORE: -rw------- 1 mark mark 2953216 Feb 25 08:17 ircd.core
=================== STOP HERE ======================

EDIT:

I should also note that last night after the maintenance, I changed all the config files to autoconnect back to the main Europe hub and tried the same method of relinking as pasted above. This time I used "#" instead of "//" for comments on autoconnect in the link block for the hub I didn't want to connect to. They all linked to the right place then.

I'm not sure what happened before though with 3 of them still autoconnecting to the hub that was commented out ("//") even after multiple successful rehashes.

Thanks

-Mark

mixx941

2006-02-26 17:31

reporter   ~0011308

Last edited: 2006-02-26 17:40

Here's an update with more weird behavior. The same server that crashed and that I pasted the backtrace on is now trying to autoconnect to a server again that has autoconnect disabled. Not only that, it's trying to connect to the wrong IP address. Here's an example:

--- *** Global -- from server.se.eu.mynet.net: ERROR from hub1.eu.mynet.net[3.4.5.6] -- Closing Link: [x.x.x.x] (Server Exists)

The "3.4.5.6" address it's trying to autoconnect to is actually the address of a different hub (hub2.eu.mynet.net), which is the hub that it's currently already linked to...which is why its saying Server Exists.

The link block in this server for hub1.eu.mynet.net has a completely different IP address and no autoconnect enabled, but yet its still trying that incorrect IP and autoconnecting right now.

link hub1.eu.mynet.net
{
        username *;
        hostname 1.2.3.4;
        bind-ip *;
        port 0000;
        hub *;
        password-connect "x";
        password-receive "x";
        class servers;
                options {
                        zip;
                        };
};

At one point, I do believe there was a mistake in the IP addresses, but they're all correct now and I've rehashed many times which have all said successful.

Maybe the problem here is that the servers that this happens on are somehow just not rehashing the config file properly. That would account for why even after rehashes changing autoconnect and squitting that it reconnects to the server that HAD autoconnect on, not the one that currently does. However, I did the same upgrade procedure on all servers from 3.2.3 to 3.2.4 and this only happens on three of them...so I am not sure what kind of user error this would be.

Let me know if I'm not making sense or if I can help with better information.

Thanks

-Mark

syzop

2006-04-23 19:46

administrator   ~0011599

Last edited: 2006-04-23 19:47

Are you still experiencing this problem? Or was it just for a few days...

PS: It sounded like memory corruption

mixx941

2006-04-24 16:37

reporter   ~0011604

It has not crashed again but the autoconnect issue is still present, even on a server that was recompiled and moved to a different box. Here's an example from a week or so ago that I posted on the forums:

Leaf A is connected to Hub A. Hub A goes down, and I am alarmed of this by monitoring software. I edit unrealircd.conf on Leaf A to remove the autoconnect to Hub A and enable autoconnect to Hub B. I rehash and it immediately autoconnects to Hub B like it's supposed to. However, it still continues trying to connect to Hub A even though autoconnect is now commented out. I rehash several times at this point, try a hash style comment instead of the usual "//"...still attempting. I even go so far as to completely remove the commented out autoconnect line and rehash several times....but it keeps going.

--- *** Global -- from leaf.a: No response from hub.a[x.x.x.x], closing link
--- *** Global -- from leaf.a: No response from hub.a[x.x.x.x], closing link
--- *** Global -- from leaf.a: No response from hub.a[x.x.x.x], closing link
[repeat indefinitely]

This causes a problem because Hub B is always supposed to be linked to Hub A, not a leaf to Hub A and Hub B at the same time because then traffic between the two hubs goes through that leaf instead of directly between the hubs. However, with Leaf A and Hub B trying to autoconnect to Hub A, who knows which one will get successfully connected when it comes back up...just whichever one gets there first based upon when Hub A comes back online.

http://forums.unrealircd.com/viewtopic.php?p=17649#17649

-Mark

syzop

2006-04-24 16:51

administrator   ~0011605

ic.

if it happens next time, could you do a '/stats c' on the server to see what it thinks about the links block list?

mixx941

2006-06-05 11:24

reporter   ~0011857

The situation happened again today. Two leaves are still trying to autoconnect to the down hub even with autoconnect commented out. I tried "//" style comments as well as hash and neither work. Completely removing the autoconnect line doesn't work either. Here is the relevant lines of /stats c from them both.

LEAF 1:

--- C *@ip.ip.ip.ip * hub2.us.mynet.net XXXX servers Sz
--- H * * hub2.us.mynet.net
--- C *@ip.ip.ip.ip * hub2.us.mynet.net XXXX servers aSzT
--- H * * hub2.us.mynet.net

LEAF 2:

--- C *@ip.ip.ip.ip * hub2.us.mynet.net XXXX servers Sz
--- H * * hub2.us.mynet.net
--- C *@ip.ip.ip.ip * hub2.us.mynet.net XXXX servers aSzT
--- H * * hub2.us.mynet.net

(Yes, it's listed twice on both servers. I see one has the "a" and one does not. However it's listed only once in the config file WITHOUT autoconnect, and I've rehashed tons of times)

Just keeps trying over and over though:

--- *** Global -- from leafa.mynet.net: No response from hub2.us.mynet.net[ip.ip.ip.ip], closing link
--- *** Global -- from leafb.mynet.net: No response from hub2.us.mynet.net[ip.ip.ip.ip], closing link

Thanks

-Mark

syzop

2006-06-05 11:26

administrator   ~0011858

Thanks.

Could you paste your 'servers' class block as well?

mixx941

2006-06-05 11:56

reporter   ~0011859

Sure. It's the same on both leaves except one has a 15 second connfreq and the other a 30 second. Now that I see the difference I will match them up:

class servers
{
        pingfreq 90;
        maxclients 10;
        sendq 1000000;
        connfreq 15;
};

Thanks

-Mark

syzop

2006-06-05 12:57

administrator   ~0011861

Maybe the low connfreq is the "interesting factor" here.

What you are seeing in /stats c is a new one without autoconnect, which is correct since you removed it/commented itout, and another one with autoconnect (a) and also with a 'T' (temporary), the 'T' one should be removed as soon as any link (attempt) using that configblock is no longer in use, but maybe it never has a chance to reach that situation with the low connfreq... That's my best guess at the moment at least :).
The thing is, that any new attempts should be using the new block (not the one with 'a' and 'T'), so even with a low connfreq it shouldn't have any problem, but maybe that's not working right for some reason... ;p

Ah well, got the feeling we are getting somewhere at least :P

syzop

2006-06-05 13:12

administrator   ~0011862

And that's it :P.

Fixed in CVS .526:
- Fixed problem with IRCd using old link block settings if using a low connfreq, this made it
  for example near-impossible to remove autoconnect for such a server. Reported by mixx941
  (0002836).

If you don't want to upgrade to current CVS, the patch is simply changing one line:
--- src/ircd.c 21 May 2006 00:35:45 -0000 1.1.1.1.6.1.2.190.2.26
+++ src/ircd.c 5 Jun 2006 18:10:41 -0000
@@ -450,7 +450,7 @@
                /*
                 * Also when already connecting! (update holdtimes) --SRB
                 */
- if (!(aconf->options & CONNECT_AUTO))
+ if (!(aconf->options & CONNECT_AUTO) || (aconf->flag.temporary == 1))
                        continue;

                cltmp = aconf->class;

(and run 'make' again, and restart)

Issue History

Date Modified Username Field Change
2006-02-25 01:51 mixx941 New Issue
2006-02-25 01:51 mixx941 3rd party modules => m_ircops, nocolorumode, m_privdeaf
2006-02-25 05:55 syzop Note Added: 0011296
2006-02-25 13:59 mixx941 Note Added: 0011297
2006-02-25 14:04 mixx941 Note Edited: 0011297
2006-02-26 17:31 mixx941 Note Added: 0011308
2006-02-26 17:40 mixx941 Note Edited: 0011308
2006-04-23 19:46 syzop Note Added: 0011599
2006-04-23 19:47 syzop Note Edited: 0011599
2006-04-24 16:37 mixx941 Note Added: 0011604
2006-04-24 16:51 syzop Note Added: 0011605
2006-06-05 11:24 mixx941 Note Added: 0011857
2006-06-05 11:26 syzop Note Added: 0011858
2006-06-05 11:56 mixx941 Note Added: 0011859
2006-06-05 12:57 syzop Note Added: 0011861
2006-06-05 13:08 syzop Status new => confirmed
2006-06-05 13:12 syzop Status confirmed => resolved
2006-06-05 13:12 syzop Fixed in Version => 3.2.5
2006-06-05 13:12 syzop Resolution open => fixed
2006-06-05 13:12 syzop Assigned To => syzop
2006-06-05 13:12 syzop Note Added: 0011862