View Issue Details
| ID | Project | Category | View Status | Date Submitted | Last Update |
|---|---|---|---|---|---|
| 0003650 | unreal | ircd | public | 2008-02-25 18:48 | 2008-08-08 09:30 |
| Reporter | Monk | Assigned To | |||
| Priority | normal | Severity | crash | Reproducibility | random |
| Status | closed | Resolution | no change required | ||
| Product Version | 3.2.7 | ||||
| Summary | 0003650: Strange crash | ||||
| Description | We have 3 servers running at sh3lls.net on FreeBSD boxes. From time to time the ircd just terminates without any apparent reason. The pid file is just left behind but no core file is written. To analyse the problem I attached gdb with: $ gdb /path/to/the/ircd PID Shortly after attaching gdb two servers crashed, one is still running. The backtrace of both follows: FreeBSD 4.11-STABLE FreeBSD 4.11-STABLE #0: Thu Apr 13 09:29:20 CDT 2006 #0 0x2824b370 in select () from /usr/lib/libc.so.4 No symbol table info available. #1 0x8057488 in read_message (delay=1, listp=0x8147540) at s_bsd.c:1763 cptr = (aClient *) 0x81454e0 nfds = 1 wait = {tv_sec = 1, tv_usec = 0} read_set = {fds_bits = {1756103198, 1677881474, 1092159788, 2151682177, 2172649480, 2248212800, 0, 514, 36896, 2097152, 135266368, 2147483648, 8388738, 1073741826, 0, 1140852736, 3, 16384, 0, 41943040, 0, 83886336, 144, 66560, 1048576, 17827841, 12, 303039488, 268455936, 536870912, 8522752, 2097152, 0, 169869440, 0, 2, 2147483649, 0, 0, 4096, 2097168, 1048576, 256, 1073745920, 0, 524800, 0}} write_set = {fds_bits = {0 <repeats 47 times>}} j = 113 k = -1077937816 delay2 = 1 res = 0 length = -1077937816 fd = -1077937628 i = 93 sockerr = 93 #2 0x8060010 in main (argc=1, argv=0xbfbffb98) at ircd.c:1597 oldtimeofday = 1203963888 argc = 113 argv = (char **) 0x1 uid = 1129 euid = 1129 gid = 1129 egid = 1129 delay = 1 portarg = 93 nextfdlistcheck = 1203963889 (gdb) ========================================================================================================================================== FreeBSD 6.2-RC1 FreeBSD 6.2-RC1 #0: Sat Dec 16 01:29:54 CST 2006 (gdb) bt full #0 0x0a2e02c3 in select () from /lib/libc.so.6 No symbol table info available. #1 0x08057059 in read_message (delay=1, listp=0x8146a60) at s_bsd.c:1763 s = 4 cptr = (aClient *) 0xbfbfeab0 nfds = 135547392 wait = {tv_sec = 1, tv_usec = 0} read_set = {__fds_bits = {2168487966, 830996737, 1359478816, 1048708, 134744072, 2048, 268435608, 3229679872, 1744830464, 262208, 34112640, 2359296, 0, 1073743008, 0, 8388992, 6818048, 268435504, 285212672, 1073774852, 268501121, 4194304, 327680, 134512672, 8, 1243611136, 4098, 1074807297, 1073741824, 134324224, 4, 0, 2097152, 2684370948, 2151677953, 16844805, 738336768, 151003136, 0, 1082163200, 0, 8912896, 20, 34734112, 135528448, 1212220040, 16448, 8388608, 8388608, 1111492736, 3145728, 33563648, 65552, 22020113, 49184, 0, 1276125504, 64, 536870913, 134481920, 134217744, 0, 0}} write_set = {__fds_bits = {0 <repeats 63 times>}} j = 181 k = -1077941840 delay2 = 1 res = 0 length = -1077941840 fd = 135547392 i = 4 sockerr = 4 #2 0x080602ab in main (argc=135547392, argv=0xbfbfec90) at ircd.c:1597 uid = 1799 euid = 1799 gid = 1799 egid = 1799 delay = 1 portarg = 4 nextfdlistcheck = 1203963814 (gdb) | ||||
| 3rd party modules | |||||
|
|
On a sidenote: Is there a way to start Unreal directly in gdb? Whenever I tried it just seems to fork and gdb tells me "Program exited normally." I wasn't able to find something like Anope's "run -debug -nofork" |
|
|
I think you're referencing Unreal's -backtrace trigger (./unreal -backtrace). Use that trigger and get its output from one of the core dumpfiles that Unreal should have generated on the crash. Personally, in my own opinion anyways, I'd honestly up front like to think this most likely isn't an UnrealIRCd problem as I was with sh3lls once for a little while, and ran into a multitude of issues with their servers as well as their services/support/admin, but that might be a bit bias of me : P Really though try to post a paste of the backtrace unreal will run from its -backtrace feature. |
|
|
nate, thanks for the pointer but as I wrote above, there is no core file written in this strange case so this is of no use. Regarding the sh3lls admins, I cannot say something bad about them sofar. They are helpful and polite. |
|
|
You said the one was still running though also after this 'crash'? O_o Didn't quite get that entirely, or are you talking about after trying to attach gdb to it? |
|
|
Yeah, I was not very clear here: I attached gdb to our 3 different sh3lls and like 5 minutes later two of them crashed with the bt above. Meanwhile the third one also crashed with a similar bt. |
|
|
So, after attaching, you did 'c' (continue), and then.. it crashed.. right? (so not just running gdb without the continue? the reason I ask is that it can look identical to this ;p). hm. I see the backtrace but, what was the message it crashed with... segmentation fault? broken pipe? signal error.. whatever... as for your question: gdb src/ircd (or whatever your 'ircd' BINARY is) r -F that's running in foreground mode |
|
|
After the command: gdb /path/to/the/ircd PID gdb issues some lines with loading symbols ... and then it went to the prompt. There I did nothing more, thinking it has attached and was done with it. After like 5 mins the ircd suddenly disconnected from our network. I waited a while and then typed the "bt full" command. All output is copied above. Thanks for the foreground argument. As the problem still persists, I will try the direct gdb run. |
|
|
Ok, thanks. Then the backtrace wasn't a backtrace of the crash I'm afraid, I'll explain. When attaching, so when you do: gdb /path/to/the/ircd PID then you get a (gdb) prompt, then the ircd (or any program in gdb, really) hangs, until you give it the continue command ('c') So yeah, after a couple of minutes it would have disconnected, ping timeout probably, because it didn't respond. So next time: gdb /path/to/the/ircd PID -blabla loading symbols bla- (gdb) then do: c then it should continue, until it crashes that is ;) actually even better would be two commands: handle SIGPIPE nostop c the 'handle SIGPIPE nostop' tells it not to bother you with sigpipe crap.. things that sometimes happen (or happened, I forgot).. without it you may get a (gdb) prompt again a couple of minutes, or hours, later, which will stall everything again for a stupid reason (no crash). Actually the same in the foreground thing might be a good idea as well, then it is: handle SIGPIPE nostop r -F Hope it helps :) |
|
|
Many thanks for the explanation syzop. Following your detailed question about how I did it with gdb I expected something like this ;) As I ran the ircd in gdb with the arguements you suggested, I now have another reason why the ircd just terminest, leaving the pid behind: ===================== Program received signal SIGKILL, Killed. 0x08056f7b in read_message (delay=1, listp=0x814a3e0) at s_bsd.c:1695 1695 if (IsLog(cptr)) ===================== It just received a SIGKILL. Now this can possibly have two explanations: 1) The folks at sh3lls are nuts - Unlikely as I experienced them as friendly and helpful and they explicitly allowed me to run the ircd in gdb 2) The number of file descriptors is limited, probably with a hard limit in limits.conf. The shell I rented allows me to run 1500 file descriptors. So in the config file the number of clients is limited to 1495 and the server connects to one hub. No other services/bncs/whatsoever are connected to the server. This should in total give no more than 1496 open files. Is there a way to see how many files where open when it received the SIGKILL? Edit: The command (gdb) shell lsof -p 36309 | wc -l 1245 (gdb) would indicate that it was not a hard security limits kill, tho I don't know if this way of getting open files is valid in this context or if it would get leaked files as will. |
|
|
At least we know it isn't a crash :). Now as to why it receives SIGKILL from somewhere... I've no idea. If it hits a fd limit, the ircd would just send error messages and such and not SIGKILL... that's my experience. Perhaps you could bother the provider, various ones kill processes for various reasons.. inluding cpu usage, memory usage, or.. whatever.. |
|
|
I'm closing this one Monk, because I don't think it's a fault in Unreal. Hope you solved things. |
| Date Modified | Username | Field | Change |
|---|---|---|---|
| 2008-02-25 18:48 | Monk | New Issue | |
| 2008-02-25 18:59 | Monk | Note Added: 0015176 | |
| 2008-02-25 19:13 | nate | Note Added: 0015179 | |
| 2008-02-25 19:45 | Monk | Note Added: 0015182 | |
| 2008-02-25 21:51 | nate | Note Added: 0015185 | |
| 2008-02-26 06:02 | Monk | Note Added: 0015188 | |
| 2008-02-29 13:44 | syzop | Note Added: 0015200 | |
| 2008-02-29 18:05 | Monk | Note Added: 0015206 | |
| 2008-03-06 12:51 | syzop | Note Added: 0015209 | |
| 2008-03-07 18:28 | Monk | Note Added: 0015218 | |
| 2008-03-07 18:45 | Monk | Note Edited: 0015218 | |
| 2008-03-07 18:46 | Monk | Note Edited: 0015218 | |
| 2008-03-29 20:56 | syzop | Note Added: 0015239 | |
| 2008-08-08 09:29 | syzop | QA | => Not touched yet by developer |
| 2008-08-08 09:29 | syzop | U4: Need for upstream patch | => No need for upstream InspIRCd patch |
| 2008-08-08 09:29 | syzop | Status | new => closed |
| 2008-08-08 09:30 | syzop | Note Added: 0015344 | |
| 2008-08-08 09:30 | syzop | Resolution | open => no change required |