View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0002229 | unreal | ircd | public | 2004-12-08 12:11 | 2005-02-27 22:22 |
Reporter | MagicalTux | Assigned To | syzop | ||
Priority | normal | Severity | crash | Reproducibility | N/A |
Status | resolved | Resolution | fixed | ||
Platform | AMD / 512MB+ RAM | OS | Linux Debian | OS Version | 3.1 |
Product Version | 3.2.2 | ||||
Fixed in Version | 3.2.3 | ||||
Summary | 0002229: crash at res.c:1385 | ||||
Description | Today two of my servers crashed with same backtrace... It hapenned without any visible reason... Backtrace included, coredumps+binary available on request. (non modified sourcecodes, without any special module) | ||||
Steps To Reproduce | - | ||||
Additional Information | #0 0x080689d6 in find_cache_number (rptr=0x0, numb=0x8439038 "SÃ\033Ã|\020") at res.c:1385 1385 for (i = 0; HE(cp)->h_addr_list[i]; i++) (gdb) bt #0 0x080689d6 in find_cache_number (rptr=0x0, numb=0x8439038 "SÃ\033Ã|\020") at res.c:1385 #1 0x08067a78 in gethost_byaddr (addr=0x8439038 "SÃ\033Ã|\020", lp=0xbfffefb0) at res.c:461 #2 0x0806c148 in start_of_normal_client_handshake (acptr=0x8438cf8) at s_bsd.c:1353 #3 0x0806bef1 in add_connection (cptr=0x819cba0, fd=238) at s_bsd.c:1337 #4 0x0806c87c in read_message (delay=2, listp=0x815db60) at s_bsd.c:1872 #5 0x08062f57 in main (argc=0, argv=0xbffff944) at ircd.c:1560 (gdb) print i $1 = 135419424 #0 0x080862b7 in find_cache_number (rptr=0x0, numb=0x8ef8420 "SN\024³ù\r") at res.c:1385 1385 for (i = 0; HE(cp)->h_addr_list[i]; i++) (gdb) bt #0 0x080862b7 in find_cache_number (rptr=0x0, numb=0x8ef8420 "SN\024³ù\r") at res.c:1385 #1 0x08085360 in gethost_byaddr (addr=0x8ef8420 "SN\024³ù\r", lp=0xbfffee40) at res.c:461 #2 0x08089b17 in start_of_normal_client_handshake (acptr=0x8ef80e0) at s_bsd.c:1353 #3 0x0808989d in add_connection (cptr=0x82566e8, fd=563) at s_bsd.c:1337 #4 0x0808a2ed in read_message (delay=2, listp=0x821aae0) at s_bsd.c:1872 #5 0x08080903 in main (argc=2, argv=0x0) at ircd.c:1560 #6 0x400c2dc6 in __libc_start_main () from /lib/libc.so.6 (gdb) print i $1 = 136180576 | ||||
Tags | No tags attached. | ||||
Attached Files | |||||
3rd party modules | |||||
has duplicate | 0002264 | closed | Unreal3.2.2 crash |
|
So this is really 3.2.2? Interesting, since I thought I fixed this. Could you mail the src/ircd binary, commands.so and the cores (zipped) to [email protected] (or: url, whatever ;p). |
|
Odd problem, cores received, will contact you within 1-2 days (probably posting mpatrol instructions).. In case I forget however, just post a note or mail me again :p. |
|
status private->public. Had no feedback yet, also HERZ reported the same problem in 0002264 If anyone else is experiencing this problem and wants to help out tracing this (by running the ircd in a "special manner") let me know at [email protected] and I'll mail step-by-step instructions |
|
Well... My servers are regulary crashing (up to 3 crashes in 24 hours). I could not try yet to run the mpatrol thing and lost the mail. Anyway I'm not sure to have enought ram on a server to run it (maybe I'll have to use a new server x_x ). Disabling the DNS cache on one of my servers made it "stable". Also could you make a page with the mpatrol things or mail it to me again ? (yeah I lost it.. When you receive 70 mails per hour it things that can happen) |
|
File attached, and just for reference & others, I'll paste my mail to you & bigi below: [quote]Hi, I heard you two had problems with UnrealIRCd crashing. Unfortunately these issues are hard to trace down, first they somehow rarely happen, second.. when the server crashes the bug was already caused (and corrupted memory) somewhere earlier: that could be jus 1 second / 500 lines of code earlier, but also several HOURS. The only way to trace these things down (besides looking at the source, but that's not always working :|) is to use an irc server with run-time memory protection enabled. Instructions attached (unreal_mpatrol.txt). mpatrol-1.4.8-with-patch2.tar.gz should be fetched from: www.vulnscan.org/tmp/mpatrol-1.4.8-with-patch2.tar.gz Also a word of caution: DO NOT REHASH, since then the ircd will crash. OR: to fix this you can delete all 'irc_dlclose' lines in src/modules.c (which means you won't be able to reload/unload any modules, but at least you can rehash, try to rehash as few times as possible however, since it introduces a memory leak). This step is completely optional btw, if you never rehash then just skip it. Well.. hope that after all these warnings someone will still installs it and gets the bug triggered, so we can get it fixed :). We get around 1 report on this every 2-3 months, I fixed 2 resolver bugs in 3.2.2, which caused this exact issue, but apparently there's another one :p. Thanks in advance! Bram. 12475cc822f71dddcc674a06b7901e9c mpatrol-1.4.8-with-patch2.tar.gz[/quote] |
|
On an unrelated sidenote, you could have just mailed me :P. I don't understand if your servers are crashing every 24h (or even multiple times per 24h) that you don't do that? ;) |
|
Well... I didn't have time to compile a special ircd with the mpatrol thing. I just had a look at the code and commented the DNS cache system. I have a server which is not crashing anymore, and almost all my network connected to it. I know it's *really* bad but it's the best I could do since time is such a rare good. By the way I found a server with enought memory, I'll run mpatrol on it soon. |
|
Could you send me your modified files (src/res.c I suppose?) to [email protected] ? Just for reference... If it's true what you say, then it basically means the bug is in the cache system and not in the "deal with DNS queries/responses" system. |
|
Hey ho, i reported it allready to bugs @ unrealircd, but we (insiderZ.DE Network) we have the same problem since Dez/2004. Servers crashing every Day / Week but was running 100 / 170 Days before Dez/2004. First we thought it is an Bug in Unreal3.2.1, so we have updated 05/01/2005 to Unreal3.2.2. But, same Problem here, Servers crashing and everytime the same Problem. Today i have install mpatrol, but ircd still not starting Error is: insiderz@matrix:~/Unreal3.2$ ./unreal start Starting UnrealIRCd ./unreal: line 39: 11298 Segmentation fault /home/insiderz/Unreal3.2/src/ircd Possible error encountered (IRCd seemily not started) Here is my Backtrace, iam running Unreal3.2.2 on Linux Debian 3.1 with Kernel 2.6.10 / AMD 2.4 / 512 MB (gdb) bt #0 0x0806ac1d in find_cache_number (rptr=0x827f7a8, numb=0x827f884 "Ù\030ÚQÙ \203ÈQ©¼K") at res.c:1301 #1 0x0806ad6f in make_cache (rptr=0x827f7a8) at res.c:1369 #2 0x0806a5ad in get_res (lp=0x81257f0 "h!G\b") at res.c:941 #3 0x08070286 in do_dns_async () at s_bsd.c:2577 #4 0x0806faca in read_message (delay=1, listp=0x8155de0) at s_bsd.c:1757 #5 0x08064000 in main (argc=0, argv=0x0) at ircd.c:1529 (gdb) |
|
Ok, one of our Hubs is crashed, Hubs are NON CLIENT Servers. (No Open Client Ports) So it is sure, that dns_cache Crash is a System Problem not produced/crashed by Local Users. Chris |
|
Well... Here is my modification to res.c ... It may look blind but it disables cache lookups in *some* cases (when rptr is null). This server didn't crash again yet since the modification, but it may be unrelated. Users graph for this server : http://www.irc.ff.st/img/mrtg/stlouis2.us.irc.ff.st-month.png diff -U3 -r Unreal3.2/src/res.c ../Unreal3.2/src/res.c --- Unreal3.2/src/res.c 2004-10-27 18:45:28.000000000 +0000 +++ ../Unreal3.2/src/res.c 2004-12-24 09:25:19.000000000 +0000 @@ -1377,6 +1377,7 @@ inetntoa(numb), ntohl(ip->s_addr), hashv)); #endif #endif + if (rptr == NULL) return NULL; for (; cp; cp = cp->hnum_next) { #ifdef INET6 I'm still not sure it changes anything but well... it *may* |
|
okay i have add this modification in two of my Unreal3.2.2 Servers ... lets see how it works. |
|
Thanks for the info of both of you.. ->MagicalTux This could narrow it down if it's that.. then again, it reduces cache lookups by 50% so if the bug is still present it might take just a bit longer.. How long have you been running without a crash MagicalTux? :) I also might have some mpatrol results from HERZ that could be useful, but have to look into that first.. Will keep this bugreport up to date in case I get any results (or lack of). |
|
Syzop, we get NO Server started with mpatrol. Every Unrealserver with mpatrol starts with "Segmentation fault" so you can wait long long time for a mpatrol.log :) |
|
Yes, I know.. u^Hur servers suck :P (I hate that kind of humor) That every server crashes with segmentation fault doesn't necessarily mean anything is wrong. There's also a chance that it crashes so fast that it already crashes during startup, this has happend to me countless times. So, it could well be that there is no core file that there are no useful results at all.. but I just don't want to "ignore" any testresults :P. Anyway, why are we doing duplicate conversations here, we are already mailing :P. |
|
then again, it reduces cache lookups by 50% so if the bug is still present it might take just a bit longer.. >> Hard to tell if the bug is still present and if it's really related to this function, but as you can see on the user graph of this server it didn't crash again for more than one week ! |
|
I see.. But if I look at the graph (interesting way to get uptime;p) I see that there was (almost) like a week between the crashes and before that.. 2w no crash?? or :P. So.. should we stay cautiously optimistic, or? :). |
|
Haha, this res.c modification is senseless, on my Server was 120 Users then this (6 mins ago) insiderz@matrix:~/Unreal3.2$ ls -la core -rw------- 1 insiderz insiderz 4345856 2005-01-06 18:31 core insiderz@matrix:~/Unreal3.2$ #0 0x0806b94d in find_cache_number (rptr=0x83245f0, numb=0x83246c4 "ÙP_¹") at res.c:1386 1386 for (i = 0; HE(cp)->h_addr_list[i]; i++) (gdb) bt #0 0x0806b94d in find_cache_number (rptr=0x83245f0, numb=0x83246c4 "ÙP_¹") at res.c:1386 #1 0x0806ba8f in make_cache (rptr=0x83245f0) at res.c:1454 #2 0x0806b29d in get_res (lp=0x81267d0 "p") at res.c:1025 #3 0x08070fa6 in do_dns_async () at s_bsd.c:2578 #4 0x080707ea in read_message (delay=1, listp=0x8156dc0) at s_bsd.c:1757 #5 0x08064dc0 in main (argc=0, argv=0x0) at ircd.c:1541 |
|
calm.. we are just trying to help :p Anyway, that could bring is back to where we were.. could you reply to my last mail HERZ? (no it isn't that urgent, but just don't think it's no longer needed ;p) Also, I presume you guys all have different server setups? So nothing in common with those servers that crash & their nameserver configuration? Like all running non-BIND nameservers, all running remote (DNS server not on localhost), etc... |
|
An alternative to mpatrol is to use valgrind: go to http://valgrind.kde.org/ for the source, or use your favorite dist/OS if it has a package of it (on debian/testing all that was needed was apt-get install valgrind) And then run: valgrind --log-file=mylog src/ircd (oh and this should be done on a normal ircd, not an ircd prepared for mpatrol :p) I don't know how good it is.. it seems a lot faster/cleaner, I just don't know if it can catch all (heap) bugs in realtime, certainly worth a try however :). Oh.. and of course if your ircd crashes, send mylog.pid<something> to me at [email protected] (yes, please the full log [g/zip'ed or not], not just the last 10 lines or something). |
|
Crazy huh ? without valgrind - ircd starts successfull. with valgrind it says "Fix Maxconnections" insiderz@matrix:~/Unreal3.2$ valgrind --log-file=mylog src/ircd * Loading IRCd configuration .. * Configuration loaded without any problems .. * Loading tunefile.. * Initializing SSL. * Dynamic configuration initialized .. booting IRCd. --------------------------------------------------------------------- The OS enforces a limit on max open files Hard Limit: 820 MAXCONNECTIONS: 1024 Fix MAXCONNECTIONS insiderz@matrix:~/Unreal3.2$ |
|
Okay... We found something out. Since crashes (Dez/2004) our DNS IP has changed in config from dns::nameserver: 213.131.254.5 to dns::nameserver: 213.131.230.143. We have global configs, this means, next day 7 am. All Servers have the new DNS IP hashed. Since them ircd was crashing _every_ Day. Since two Days we have the old IP dns::nameserver: 213.131.254.5 in all Servers included. And now... Servers running fine. Don`t ask me wich binds are running on 213.131.254.5/213.131.230.143 this is a ISP and i dont have administrative Access. |
|
The fun thing is that the set::dns is almost always ignored by unreal (don't ask me why ;p), it uses the info from /etc/resolv.conf. Also info from /etc/resolv.conf is only used on startup, if it's changed.. a rehash will not reread it. You can doublecheck this by doing '/quote dns i' to get the current nameserver configuration. Weird @ valgrind btw, I've no idea where that's coming from (don't have that problem here), but you could of course just recompile for 820 connections if that is not too much of a problem :) |
|
Well I always use "127.0.0.1" in /etc/resolv.conf (and install named on all my servers as it usually give better results). However in my Unreal config file I use another IP. I'll try to forge some DNS replies and see what happens. |
|
Just a Tip Start Unreal with valgrind: /usr/bin/valgrind.bin --error-limit=no --verbose --time-stamp=yes --log-file=unrealdebug.log src/ircd I have now Setup a UnrealIRCD with valgrind. |
|
Today one Server was crashing with valgrind debug output core debug log was send to Syzop. Regards HERZ |
|
Hm, forgot to close this bug :p. |
Date Modified | Username | Field | Change |
---|---|---|---|
2004-12-08 12:11 | MagicalTux | New Issue | |
2004-12-08 12:38 | syzop | Note Added: 0008553 | |
2004-12-08 12:38 | syzop | View Status | public => private |
2004-12-08 23:40 | syzop | Note Added: 0008560 | |
2004-12-08 23:40 | syzop | Status | new => acknowledged |
2005-01-05 12:06 | syzop | Relationship added | has duplicate 0002264 |
2005-01-05 12:08 | syzop | Note Added: 0008706 | |
2005-01-05 12:08 | syzop | View Status | private => public |
2005-01-05 12:09 | syzop | Note Edited: 0008706 | |
2005-01-05 12:21 | MagicalTux | Note Added: 0008707 | |
2005-01-05 12:24 | syzop | File Added: unreal_mpatrol.txt | |
2005-01-05 12:24 | syzop | Note Added: 0008708 | |
2005-01-05 12:26 | syzop | Note Added: 0008709 | |
2005-01-05 12:53 | MagicalTux | Note Added: 0008710 | |
2005-01-05 14:18 | syzop | Note Added: 0008711 | |
2005-01-06 03:04 | HERZ | Note Added: 0008712 | |
2005-01-06 04:32 | HERZ | Note Added: 0008713 | |
2005-01-06 04:49 | MagicalTux | Note Added: 0008714 | |
2005-01-06 05:53 | HERZ | Note Added: 0008715 | |
2005-01-06 10:55 | syzop | Note Added: 0008716 | |
2005-01-06 10:59 | HERZ | Note Added: 0008717 | |
2005-01-06 11:09 | syzop | Note Added: 0008718 | |
2005-01-06 11:15 | syzop | Note Edited: 0008718 | |
2005-01-06 11:31 | MagicalTux | Note Added: 0008719 | |
2005-01-06 11:41 | syzop | Note Added: 0008720 | |
2005-01-06 12:48 | HERZ | Note Added: 0008721 | |
2005-01-06 14:35 | syzop | Note Added: 0008722 | |
2005-01-06 15:01 | syzop | Note Added: 0008723 | |
2005-01-06 15:02 | syzop | Note Edited: 0008723 | |
2005-01-06 15:04 | syzop | Note Edited: 0008723 | |
2005-01-08 07:44 | HERZ | Note Added: 0008738 | |
2005-01-08 07:51 | HERZ | Note Added: 0008739 | |
2005-01-08 11:25 | syzop | Note Added: 0008741 | |
2005-01-10 05:04 | MagicalTux | Note Added: 0008753 | |
2005-01-10 10:45 | HERZ | Note Added: 0008754 | |
2005-01-19 11:23 | HERZ | Note Added: 0008860 | |
2005-02-27 22:22 | syzop | Status | acknowledged => resolved |
2005-02-27 22:22 | syzop | Fixed in Version | => 3.2.3 |
2005-02-27 22:22 | syzop | Resolution | open => fixed |
2005-02-27 22:22 | syzop | Assigned To | => syzop |
2005-02-27 22:22 | syzop | Note Added: 0009345 |