View Issue Details

IDProjectCategoryView StatusLast Update
0003430unrealircdpublic2007-09-02 15:36
ReporterHurga Assigned Tosyzop  
PrioritynormalSeveritymajorReproducibilityrandom
Status resolvedResolutionfixed 
Platformi386OSUnixOS Versionseveral
Product Version3.2.6 
Fixed in Version3.2.8 
Summary0003430: invalid banlist entries created by netsplits
DescriptionOn our network (peak 1400 users, 6 servers running different Linux and FreeBSD versions) we experience problems with syntactically invalid bans which seem to be created by netsplits. Examples: "*!*" "lofi" "magicwindow*" "*!~KEL" "*!"

The bans can't be cleared by any way we tried except removing the channel. Since the channel can get unusable by a ban matching everyone, this can be quite a problem.
Steps To ReproduceHave many netsplits on a busy network. Works for us...

Seriously, since the problem is obviously not easy to reproduce or debug, I can get whatever data you need from the network where I have these broken bans.
Additional InformationNetwork is running Anope 1.6.5 with a few small modifications (retrieval of encrypted passwords and somesuch). Not sure if it matters.
TagsNo tags attached.
Attached Files
m_server.c (42,470 bytes)
3rd party modules

Relationships

child of 0003454 resolvedsyzop Unreal3.2.8 TODO 

Activities

Bock

2007-07-08 23:34

reporter   ~0014458

In my network I see smth like that one time.. Can't reproduce..

Hurga

2007-07-09 07:08

reporter   ~0014460

Forgot to mention that the invalid bans are (usually?) fragments of previously existing correct bans.

stskeeps

2007-07-09 15:34

reporter   ~0014464

Hurga: Does new bans occour after the upgrades?
As in, new bans getting produced.

Hurga

2007-07-09 16:32

reporter   ~0014467

Yes, we've seen broken bans being created by splits even after updating all the servers to 3.2.6.

You asked how the bans looked previously. Example:
"*!~KEL" should have been "*!~KELEBEK*@*"
"magicwindow*" should have been "magicwindow*!*@*"

argvx

2007-07-10 10:16

reporter   ~0014471

in my network same problem, i think bug in SJ3 protocol, after some debug`s I can see some info sent by a server has more than 510 bytes in the line if on channel large than 35 bans for example and ~140 users, after one server come back from split it randomly set copy of ban`s/invex`s/except`s, as in the post here.

I can give debug logs if needed to help fix this, but only private via email.

stskeeps

2007-07-10 10:23

reporter   ~0014472

Please apply this patch to the servers:

http://bsd.tspre.org/~stskeeps/truncate.patch

This will warn when it has to truncate when sending to servers and log the message to the ircd log (please activate this properly)

stskeeps

2007-07-11 05:52

reporter   ~0014474

http://bsd.tspre.org/~stskeeps/newtruncate.patch

new patch.

stskeeps

2007-07-15 19:28

reporter   ~0014499

http://bsd.tspre.org/~stskeeps/m_server.c <- Possible fix for this.

argvx

2007-07-16 05:28

reporter   ~0014502

I think it must be something like that, with preffix ":" in m_server.c:
        if (nomode && nopara)
        {
                ircsprintf(buf,
- (cptr->proto & PROTO_SJB64 ? "%s %B %s :" : "%s %ld %s :"),
+ (cptr->proto & PROTO_SJB64 ? ":%s %s %B %s :" : ":%s %s %ld %s :"), me.name,
                    (IsToken(cptr) ? TOK_SJOIN : MSG_SJOIN),
                    (long)chptr->creationtime, chptr->chname);
        }
        if (nopara && !nomode)
        {
                ircsprintf(buf,
- (cptr->proto & PROTO_SJB64 ? "%s %B %s %s :" : "%s %ld %s %s :"),
+ (cptr->proto & PROTO_SJB64 ? ":%s %s %B %s %s :" : ":%s %s %ld %s %s :"), me.name,
                    (IsToken(cptr) ? TOK_SJOIN : MSG_SJOIN),
                    (long)chptr->creationtime, chptr->chname, modebuf);
        }
        if (!nopara && !nomode)
        {
                ircsprintf(buf,
- (cptr->proto & PROTO_SJB64 ? "%s %B %s %s %s :" : "%s %ld %s %s %s :"),
+ (cptr->proto & PROTO_SJB64 ? ":%s %s %B %s %s %s :" : ":%s %s %ld %s %s %s :"), me.name,
                    (IsToken(cptr) ? TOK_SJOIN : MSG_SJOIN),
                    (long)chptr->creationtime, chptr->chname, modebuf, parabuf);
        }

AL

2007-07-17 03:52

reporter   ~0014506

More precisely, I think this bug is linked to sender prefixes.
Let us assume that server A is linking to server B. In current implementation of sjoin3 server A create message without sender prefix ("~ ..." instead of ":A ~ ...", where ~ - message token). When server B received this message all is ok. But, what if server C is already linked to server B? Server B must forward message from server A to all servers in network, including server C, but server B must point which server is a original sender of message. So, server B attached sender prefix :A to message and (possibly) overflows it! (I saw situation like this in my debug logs).
In one word, send_channel_modes_sjoin3 does't reserve space for prefix and uses almost all BUFSIZE. My solution: add sender prefix at once (see patch above).

Btw, stskeeps's solution with limit (BUFSIZE - 80) may fix problem in networks with server names shorter than 76 characters, but I think its not correct to lay on this.

ps. means of transfering messages across network with dynamic prefixes usage is not fully clear for me (and some my tests on reproducing bug are failed), so possibly my reasonings have some mistakes.

Bock

2007-07-17 04:01

reporter   ~0014507

/me don't see patch :(

AL

2007-07-17 05:26

reporter   ~0014509

Last edited: 2007-07-17 05:28

(was in message from fbi) Also uploaded full file (see in Attached Files)

syzop

2007-09-02 07:49

administrator   ~0014747

Last edited: 2007-09-02 08:45

I guess this is indeed caused by not reserving (enough) room for prefixes, odd it's only noticed now...
I'll put a fix in tomorrow, fbi/AL's is a good candidate. Stskeeps' fix works fine too, btw.
Can you (fbi/whoever) confirm this indeed fixed it?
--
edit: oh just to avoid some misunderstandings, it will still be a couple of weeks till I get back. i'll just handle this one issue this week.

argvx

2007-09-02 10:59

reporter   ~0014748

with this patches it work fine

syzop

2007-09-02 15:35

administrator   ~0014749

Last edited: 2007-09-02 15:36

Fixed in .679:
- Fixed bug in SJOIN, possibly causing things like odd bans showing up in
  some circumstances. Reported by Hurga, patch provided by fbi.

If you're still encountering any issues after this (or any other weird sjoin stuff), just let us know.

Oh, and if I got the credit for the patch incorrect, let me know of course ([email protected]), wasn't very clear.

Issue History

Date Modified Username Field Change
2007-07-08 21:53 Hurga New Issue
2007-07-08 23:34 Bock Note Added: 0014458
2007-07-09 07:08 Hurga Note Added: 0014460
2007-07-09 15:34 stskeeps Note Added: 0014464
2007-07-09 15:35 stskeeps Status new => acknowledged
2007-07-09 15:36 stskeeps Relationship added child of 0003111
2007-07-09 16:32 Hurga Note Added: 0014467
2007-07-10 10:16 argvx Note Added: 0014471
2007-07-10 10:23 stskeeps Note Added: 0014472
2007-07-11 05:52 stskeeps Note Added: 0014474
2007-07-15 19:28 stskeeps Note Added: 0014499
2007-07-16 05:28 argvx Note Added: 0014502
2007-07-17 03:52 AL Note Added: 0014506
2007-07-17 04:01 Bock Note Added: 0014507
2007-07-17 05:26 AL Note Added: 0014509
2007-07-17 05:27 AL File Added: m_server.c
2007-07-17 05:28 AL Note Edited: 0014509
2007-09-02 07:49 syzop Note Added: 0014747
2007-09-02 08:45 syzop Note Edited: 0014747
2007-09-02 10:59 argvx Note Added: 0014748
2007-09-02 15:25 syzop Relationship deleted child of 0003111
2007-09-02 15:25 syzop Relationship added child of 0003454
2007-09-02 15:35 syzop QA => Not touched yet by developer
2007-09-02 15:35 syzop U4: Need for upstream patch => No need for upstream InspIRCd patch
2007-09-02 15:35 syzop Status acknowledged => resolved
2007-09-02 15:35 syzop Fixed in Version => 3.2.8
2007-09-02 15:35 syzop Resolution open => fixed
2007-09-02 15:35 syzop Assigned To => syzop
2007-09-02 15:35 syzop Note Added: 0014749
2007-09-02 15:36 syzop Note Edited: 0014749