View Issue Details

IDProjectCategoryView StatusLast Update
0002229unrealircdpublic2005-02-27 22:22
ReporterMagicalTuxAssigned Tosyzop 
PrioritynormalSeveritycrashReproducibilityN/A
Status resolvedResolutionfixed 
PlatformAMD / 512MB+ RAMOSLinux DebianOS Version3.1
Product Version3.2.2 
Target VersionFixed in Version3.2.3 
Summary0002229: crash at res.c:1385
DescriptionToday two of my servers crashed with same backtrace...

It hapenned without any visible reason...

Backtrace included, coredumps+binary available on request.

(non modified sourcecodes, without any special module)
Steps To Reproduce-
Additional Information#0 0x080689d6 in find_cache_number (rptr=0x0, numb=0x8439038 "SÁ\033Í|\020") at res.c:1385
1385 for (i = 0; HE(cp)->h_addr_list[i]; i++)
(gdb) bt
#0 0x080689d6 in find_cache_number (rptr=0x0, numb=0x8439038 "SÁ\033Í|\020") at res.c:1385
#1 0x08067a78 in gethost_byaddr (addr=0x8439038 "SÁ\033Í|\020", lp=0xbfffefb0) at res.c:461
#2 0x0806c148 in start_of_normal_client_handshake (acptr=0x8438cf8) at s_bsd.c:1353
#3 0x0806bef1 in add_connection (cptr=0x819cba0, fd=238) at s_bsd.c:1337
#4 0x0806c87c in read_message (delay=2, listp=0x815db60) at s_bsd.c:1872
#5 0x08062f57 in main (argc=0, argv=0xbffff944) at ircd.c:1560
(gdb) print i
$1 = 135419424


#0 0x080862b7 in find_cache_number (rptr=0x0, numb=0x8ef8420 "SN\024³ù\r") at res.c:1385
1385 for (i = 0; HE(cp)->h_addr_list[i]; i++)
(gdb) bt
#0 0x080862b7 in find_cache_number (rptr=0x0, numb=0x8ef8420 "SN\024³ù\r") at res.c:1385
#1 0x08085360 in gethost_byaddr (addr=0x8ef8420 "SN\024³ù\r", lp=0xbfffee40) at res.c:461
#2 0x08089b17 in start_of_normal_client_handshake (acptr=0x8ef80e0) at s_bsd.c:1353
#3 0x0808989d in add_connection (cptr=0x82566e8, fd=563) at s_bsd.c:1337
#4 0x0808a2ed in read_message (delay=2, listp=0x821aae0) at s_bsd.c:1872
#5 0x08080903 in main (argc=2, argv=0x0) at ircd.c:1560
#6 0x400c2dc6 in __libc_start_main () from /lib/libc.so.6
(gdb) print i
$1 = 136180576
TagsNo tags attached.
3rd party modules

Relationships

has duplicate 0002264 closed Unreal3.2.2 crash 

Activities

syzop

2004-12-08 12:38

administrator   ~0008553

So this is really 3.2.2? Interesting, since I thought I fixed this.
Could you mail the src/ircd binary, commands.so and the cores (zipped) to syzop@unrealircd.com (or: url, whatever ;p).

syzop

2004-12-08 23:40

administrator   ~0008560

Odd problem, cores received, will contact you within 1-2 days (probably posting mpatrol instructions).. In case I forget however, just post a note or mail me again :p.

syzop

2005-01-05 12:08

administrator   ~0008706

Last edited: 2005-01-05 12:09

status private->public.
Had no feedback yet, also HERZ reported the same problem in 0002264

If anyone else is experiencing this problem and wants to help out tracing this (by running the ircd in a "special manner") let me know at syzop@vulnscan.org and I'll mail step-by-step instructions

MagicalTux

2005-01-05 12:21

reporter   ~0008707

Well...

My servers are regulary crashing (up to 3 crashes in 24 hours).

I could not try yet to run the mpatrol thing and lost the mail. Anyway I'm not sure to have enought ram on a server to run it (maybe I'll have to use a new server x_x ).

Disabling the DNS cache on one of my servers made it "stable".

Also could you make a page with the mpatrol things or mail it to me again ? (yeah I lost it.. When you receive 70 mails per hour it things that can happen)

2005-01-05 12:24

 

unreal_mpatrol.txt (3,501 bytes)

syzop

2005-01-05 12:24

administrator   ~0008708

File attached, and just for reference & others, I'll paste my mail to you & bigi below:
[quote]Hi,

I heard you two had problems with UnrealIRCd crashing.
Unfortunately these issues are hard to trace down,
first they somehow rarely happen, second.. when the
server crashes the bug was already caused (and corrupted
memory) somewhere earlier: that could be jus 1 second /
500 lines of code earlier, but also several HOURS.

The only way to trace these things down (besides looking
at the source, but that's not always working :|) is to
use an irc server with run-time memory protection
enabled.

Instructions attached (unreal_mpatrol.txt).
mpatrol-1.4.8-with-patch2.tar.gz should be fetched from:
www.vulnscan.org/tmp/mpatrol-1.4.8-with-patch2.tar.gz

Also a word of caution: DO NOT REHASH, since then the
ircd will crash. OR: to fix this you can delete all 'irc_dlclose'
lines in src/modules.c (which means you won't be able
to reload/unload any modules, but at least you can rehash,
try to rehash as few times as possible however, since it
introduces a memory leak).
This step is completely optional btw, if you never rehash
then just skip it.

Well.. hope that after all these warnings someone will
still installs it and gets the bug triggered, so we can
get it fixed :).

We get around 1 report on this every 2-3 months, I fixed
2 resolver bugs in 3.2.2, which caused this exact issue,
but apparently there's another one :p.

Thanks in advance!

    Bram.

12475cc822f71dddcc674a06b7901e9c mpatrol-1.4.8-with-patch2.tar.gz[/quote]

syzop

2005-01-05 12:26

administrator   ~0008709

On an unrelated sidenote, you could have just mailed me :P. I don't understand if your servers are crashing every 24h (or even multiple times per 24h) that you don't do that? ;)

MagicalTux

2005-01-05 12:53

reporter   ~0008710

Well...
I didn't have time to compile a special ircd with the mpatrol thing. I just had a look at the code and commented the DNS cache system. I have a server which is not crashing anymore, and almost all my network connected to it. I know it's *really* bad but it's the best I could do since time is such a rare good.
By the way I found a server with enought memory, I'll run mpatrol on it soon.

syzop

2005-01-05 14:18

administrator   ~0008711

Could you send me your modified files (src/res.c I suppose?) to syzop@vulnscan.org ? Just for reference... If it's true what you say, then it basically means the bug is in the cache system and not in the "deal with DNS queries/responses" system.

HERZ

2005-01-06 03:04

reporter   ~0008712

Hey ho, i reported it allready to bugs @ unrealircd,
but we (insiderZ.DE Network) we have the same problem
since Dez/2004. Servers crashing every Day / Week but
was running 100 / 170 Days before Dez/2004.
First we thought it is an Bug in Unreal3.2.1, so we
have updated 05/01/2005 to Unreal3.2.2.
But, same Problem here, Servers crashing and everytime
the same Problem.

Today i have install mpatrol, but ircd still not starting
Error is:
insiderz@matrix:~/Unreal3.2$ ./unreal start
Starting UnrealIRCd
./unreal: line 39: 11298 Segmentation fault /home/insiderz/Unreal3.2/src/ircd
Possible error encountered (IRCd seemily not started)


Here is my Backtrace, iam running Unreal3.2.2
on Linux Debian 3.1 with Kernel 2.6.10 / AMD 2.4 / 512 MB

(gdb) bt
#0 0x0806ac1d in find_cache_number (rptr=0x827f7a8, numb=0x827f884
"Ù\030ÚQÙ \203ÈQ©¼K") at res.c:1301
#1 0x0806ad6f in make_cache (rptr=0x827f7a8) at res.c:1369
#2 0x0806a5ad in get_res (lp=0x81257f0 "h!G\b") at res.c:941
#3 0x08070286 in do_dns_async () at s_bsd.c:2577
#4 0x0806faca in read_message (delay=1, listp=0x8155de0) at s_bsd.c:1757
#5 0x08064000 in main (argc=0, argv=0x0) at ircd.c:1529
(gdb)

HERZ

2005-01-06 04:32

reporter   ~0008713

Ok, one of our Hubs is crashed, Hubs are NON CLIENT Servers. (No Open Client Ports)
So it is sure, that dns_cache Crash is a System Problem
not produced/crashed by Local Users.

Chris

MagicalTux

2005-01-06 04:49

reporter   ~0008714

Well...

Here is my modification to res.c ...

It may look blind but it disables cache lookups in *some* cases (when rptr is null).

This server didn't crash again yet since the modification, but it may be unrelated.

Users graph for this server :
http://www.irc.ff.st/img/mrtg/stlouis2.us.irc.ff.st-month.png

diff -U3 -r Unreal3.2/src/res.c ../Unreal3.2/src/res.c
--- Unreal3.2/src/res.c 2004-10-27 18:45:28.000000000 +0000
+++ ../Unreal3.2/src/res.c 2004-12-24 09:25:19.000000000 +0000
@@ -1377,6 +1377,7 @@
            inetntoa(numb), ntohl(ip->s_addr), hashv));
 #endif
 #endif
+ if (rptr == NULL) return NULL;
        for (; cp; cp = cp->hnum_next)
        {
 #ifdef INET6

I'm still not sure it changes anything but well... it *may*

HERZ

2005-01-06 05:53

reporter   ~0008715

okay i have add this modification in two
of my Unreal3.2.2 Servers ... lets see
how it works.

syzop

2005-01-06 10:55

administrator   ~0008716

Thanks for the info of both of you..

->MagicalTux This could narrow it down if it's that.. then again, it reduces cache lookups by 50% so if the bug is still present it might take just a bit longer.. How long have you been running without a crash MagicalTux? :)

I also might have some mpatrol results from HERZ that could be useful, but have to look into that first.. Will keep this bugreport up to date in case I get any results (or lack of).

HERZ

2005-01-06 10:59

reporter   ~0008717

Syzop,

we get NO Server started with mpatrol.
Every Unrealserver with mpatrol starts with
"Segmentation fault"
so you can wait long long time for a mpatrol.log :)

syzop

2005-01-06 11:09

administrator   ~0008718

Last edited: 2005-01-06 11:15

Yes, I know.. u^Hur servers suck :P
(I hate that kind of humor)

That every server crashes with segmentation fault doesn't necessarily mean anything is wrong. There's also a chance that it crashes so fast that it already crashes during startup, this has happend to me countless times. So, it could well be that there is no core file that there are no useful results at all.. but I just don't want to "ignore" any testresults :P.

Anyway, why are we doing duplicate conversations here, we are already mailing :P.

MagicalTux

2005-01-06 11:31

reporter   ~0008719

then again, it reduces cache lookups by 50% so if the bug is still present it might take just a bit longer..

>> Hard to tell if the bug is still present and if it's really related to this function, but as you can see on the user graph of this server it didn't crash again for more than one week !

syzop

2005-01-06 11:41

administrator   ~0008720

I see.. But if I look at the graph (interesting way to get uptime;p) I see that there was (almost) like a week between the crashes and before that.. 2w no crash?? or :P. So.. should we stay cautiously optimistic, or? :).

HERZ

2005-01-06 12:48

reporter   ~0008721

Haha, this res.c modification is senseless,
on my Server was 120 Users then this (6 mins ago)

insiderz@matrix:~/Unreal3.2$ ls -la core
-rw------- 1 insiderz insiderz 4345856 2005-01-06 18:31 core
insiderz@matrix:~/Unreal3.2$

#0 0x0806b94d in find_cache_number (rptr=0x83245f0, numb=0x83246c4 "ÙP_¹") at res.c:1386
1386 for (i = 0; HE(cp)->h_addr_list[i]; i++)
(gdb) bt
#0 0x0806b94d in find_cache_number (rptr=0x83245f0, numb=0x83246c4 "ÙP_¹") at res.c:1386
#1 0x0806ba8f in make_cache (rptr=0x83245f0) at res.c:1454
#2 0x0806b29d in get_res (lp=0x81267d0 "p") at res.c:1025
#3 0x08070fa6 in do_dns_async () at s_bsd.c:2578
#4 0x080707ea in read_message (delay=1, listp=0x8156dc0) at s_bsd.c:1757
#5 0x08064dc0 in main (argc=0, argv=0x0) at ircd.c:1541

syzop

2005-01-06 14:35

administrator   ~0008722

calm.. we are just trying to help :p
Anyway, that could bring is back to where we were.. could you reply to my last mail HERZ? (no it isn't that urgent, but just don't think it's no longer needed ;p)

Also, I presume you guys all have different server setups? So nothing in common with those servers that crash & their nameserver configuration? Like all running non-BIND nameservers, all running remote (DNS server not on localhost), etc...

syzop

2005-01-06 15:01

administrator   ~0008723

Last edited: 2005-01-06 15:04

An alternative to mpatrol is to use valgrind: go to http://valgrind.kde.org/ for the source, or use your favorite dist/OS if it has a package of it (on debian/testing all that was needed was apt-get install valgrind)

And then run: valgrind --log-file=mylog src/ircd
(oh and this should be done on a normal ircd, not an ircd prepared for mpatrol :p)

I don't know how good it is.. it seems a lot faster/cleaner, I just don't know if it can catch all (heap) bugs in realtime, certainly worth a try however :).

Oh.. and of course if your ircd crashes, send mylog.pid<something> to me at syzop@vulnscan.org (yes, please the full log [g/zip'ed or not], not just the last 10 lines or something).

HERZ

2005-01-08 07:44

reporter   ~0008738

Crazy huh ? without valgrind - ircd starts successfull.
with valgrind it says "Fix Maxconnections"



insiderz@matrix:~/Unreal3.2$ valgrind --log-file=mylog src/ircd
* Loading IRCd configuration ..
* Configuration loaded without any problems ..
* Loading tunefile..
* Initializing SSL.
* Dynamic configuration initialized .. booting IRCd.
---------------------------------------------------------------------
The OS enforces a limit on max open files
Hard Limit: 820 MAXCONNECTIONS: 1024
Fix MAXCONNECTIONS
insiderz@matrix:~/Unreal3.2$

HERZ

2005-01-08 07:51

reporter   ~0008739

Okay...
We found something out.

Since crashes (Dez/2004) our DNS IP has changed
in config from dns::nameserver: 213.131.254.5
to dns::nameserver: 213.131.230.143.
We have global configs, this means, next day
7 am. All Servers have the new DNS IP hashed.
Since them ircd was crashing _every_ Day.
Since two Days we have the old IP dns::nameserver: 213.131.254.5
in all Servers included.
And now... Servers running fine.
Don`t ask me wich binds are running on 213.131.254.5/213.131.230.143
this is a ISP and i dont have administrative Access.

syzop

2005-01-08 11:25

administrator   ~0008741

The fun thing is that the set::dns is almost always ignored by unreal (don't ask me why ;p), it uses the info from /etc/resolv.conf. Also info from /etc/resolv.conf is only used on startup, if it's changed.. a rehash will not reread it.
You can doublecheck this by doing '/quote dns i' to get the current nameserver configuration.
Weird @ valgrind btw, I've no idea where that's coming from (don't have that problem here), but you could of course just recompile for 820 connections if that is not too much of a problem :)

MagicalTux

2005-01-10 05:04

reporter   ~0008753

Well I always use "127.0.0.1" in /etc/resolv.conf (and install named on all my servers as it usually give better results).

However in my Unreal config file I use another IP. I'll try to forge some DNS replies and see what happens.

HERZ

2005-01-10 10:45

reporter   ~0008754

Just a Tip
Start Unreal with valgrind:
/usr/bin/valgrind.bin --error-limit=no --verbose --time-stamp=yes --log-file=unrealdebug.log src/ircd

I have now Setup a UnrealIRCD with valgrind.

HERZ

2005-01-19 11:23

reporter   ~0008860

Today one Server was crashing with valgrind debug output
core debug log was send to Syzop.

Regards
HERZ

syzop

2005-02-27 22:22

administrator   ~0009345

Hm, forgot to close this bug :p.

Issue History

Date Modified Username Field Change
2004-12-08 12:11 MagicalTux New Issue
2004-12-08 12:38 syzop Note Added: 0008553
2004-12-08 12:38 syzop View Status public => private
2004-12-08 23:40 syzop Note Added: 0008560
2004-12-08 23:40 syzop Status new => acknowledged
2005-01-05 12:06 syzop Relationship added has duplicate 0002264
2005-01-05 12:08 syzop Note Added: 0008706
2005-01-05 12:08 syzop View Status private => public
2005-01-05 12:09 syzop Note Edited: 0008706
2005-01-05 12:21 MagicalTux Note Added: 0008707
2005-01-05 12:24 syzop File Added: unreal_mpatrol.txt
2005-01-05 12:24 syzop Note Added: 0008708
2005-01-05 12:26 syzop Note Added: 0008709
2005-01-05 12:53 MagicalTux Note Added: 0008710
2005-01-05 14:18 syzop Note Added: 0008711
2005-01-06 03:04 HERZ Note Added: 0008712
2005-01-06 04:32 HERZ Note Added: 0008713
2005-01-06 04:49 MagicalTux Note Added: 0008714
2005-01-06 05:53 HERZ Note Added: 0008715
2005-01-06 10:55 syzop Note Added: 0008716
2005-01-06 10:59 HERZ Note Added: 0008717
2005-01-06 11:09 syzop Note Added: 0008718
2005-01-06 11:15 syzop Note Edited: 0008718
2005-01-06 11:31 MagicalTux Note Added: 0008719
2005-01-06 11:41 syzop Note Added: 0008720
2005-01-06 12:48 HERZ Note Added: 0008721
2005-01-06 14:35 syzop Note Added: 0008722
2005-01-06 15:01 syzop Note Added: 0008723
2005-01-06 15:02 syzop Note Edited: 0008723
2005-01-06 15:04 syzop Note Edited: 0008723
2005-01-08 07:44 HERZ Note Added: 0008738
2005-01-08 07:51 HERZ Note Added: 0008739
2005-01-08 11:25 syzop Note Added: 0008741
2005-01-10 05:04 MagicalTux Note Added: 0008753
2005-01-10 10:45 HERZ Note Added: 0008754
2005-01-19 11:23 HERZ Note Added: 0008860
2005-02-27 22:22 syzop Status acknowledged => resolved
2005-02-27 22:22 syzop Fixed in Version => 3.2.3
2005-02-27 22:22 syzop Resolution open => fixed
2005-02-27 22:22 syzop Assigned To => syzop
2005-02-27 22:22 syzop Note Added: 0009345