View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0001245 | unreal | ircd | public | 2003-09-13 11:13 | 2012-02-26 18:43 |
Reporter | syzop | Assigned To | syzop | ||
Priority | normal | Severity | feature | Reproducibility | always |
Status | resolved | Resolution | fixed | ||
Fixed in Version | 3.2.10-rc1 | ||||
Summary | 0001245: I/O engine / poll() support | ||||
Description | I'd really like to see the poll() system call (or the new system calls introduced in linux 2.6) to be put to good use. Because, as far as I have understood this whole matter and the way it is pointed out in the documentation, the main problem is that only fd_sets are supported, and the maximum fd_set size supported and hard-compiled into the linux kernel by default is 1024. This limit of 1024 file descriptors is not being imposed on poll() system calls, or the newer calls introduced into kernel 2.6. Furthermore - the newer system calls are way more performant than select() or poll() itself. This would make it possible to accept way more connections than just 1024, and in these times this limit is really pretty low. | ||||
Additional Information | original report / text @ description is by thilo, not by Syzop [!]. I just changed the 'reporter' to get rid of annoying comments. | ||||
Tags | No tags attached. | ||||
Attached Files | |||||
3rd party modules | |||||
|
[about fd limits] Well, most of the time the limit is enforced bu ulimits... Type "ulimit -n" at a random (default) *NIX installation, that will usually say '1024' (Linux, FreeBSD, ..). However, it's true that at Linux you won't have to do another step (besides increasing the ulimit): the editting of the header files. [about performance] They say poll is more CPU friendly, I'm sure that's true in theory (I know how it works), I'm not so sure if it matters that much on an average server with X00 clients however.. [in short] poll() support would be nice, I certainly agree.. an "IO engine" rewrite is on TODO, I assume support for poll will be added then, I'm not so sure if it will be added in the meantime :/. |
|
#### Well, most of the time the limit is enforced bu ulimits... Type "ulimit -n" at a random (default) *NIX installation, that will usually say '1024' (Linux, FreeBSD, ..). #### This fact is known to me. On machines with root access I can modify this limit. Thus, ulimits are of no concern in this discussion. #### However, it's true that at Linux you won't have to do another step (besides increasing the ulimit): the editting of the header files. #### Quite on the contrary: as far as goes to my knowledge, you have to recompile the whole kernel with the new FD_SETSIZE value in order to get the changes into effect. That this cannot be a solution, should be clear in my opinion. #### They say poll is more CPU friendly, I'm sure that's true in theory (I know how it works), I'm not so sure if it matters that much on an average server with X00 clients however.. #### Well, I really think it does. Before each select, you have to add new file descriptors to your set, whereas in poll you can use the old array. Building up the FD_SET once is not really resource intensive, but imagine if you have more than 1000 messages per seconds, and you have to build the new FD_SET up at least 400 times a second ... it will get noticable! |
|
Well your 2nd and 3rd point are both incorrect, but anyway I think it's clear that we both would like poll() support. |
|
you're right ;) I'll shut up already .. :P |
|
=> 3.3*, so not within a year or so. |
|
I've been trying to conceive how difficult it would be to write a new I/O engine for Unreal. After looking at the code and trying to see what would be necessary, I had a nice big headache. If we were to do it from scratch, with just 2 people working on it, it would probably take us a few years to complete and work out most of the bugs. Instead, I think the best method would be to adapt the new I/O engine from Hybrid. It's nice, it's semi-module friendly, and it would mean a bit less work. We'd just have to adapt it for Unreal, and add our own little features (e.g., SSL support). I haven't looked at the Hybrid I/O system too much, but I can't imagine it being too difficult to implement in Unreal. I'm thinking the I/O engine should be the major feature of 3.3. To users, it won't seem like much of a change, but on the inside, Unreal will be vastly different, and much cleaner. Syzop, any comments on this? |
|
I totally agree. Just as you say.. I'm sure too it would take ages to code, and if finished it would contain tons of bugs (=most worrying ;p)... And these are NOT easy to trace bugs :p. |
|
Anyone interested in this, should also check http://dev.unrealircd.com/wiki/io_engine In short it comes down to porting the hybrid I/O to Unreal. Darkelf did some early work on this, which was committed to 3.3*, but nothing substantial. So, there's still a lot of work left (pretty much all work). Of course, this job is huge and even though it's copying existing code (and porting it), it requires extensive thought and knowledge of I/O subsystems (theirs and ours). I certainly hope someone will come forward to do this ;) |
|
I'm willing to rethink about this for another release (I mean seriously reconsider it), but it's very very dangerous to change any part of the I/O engine, so it requires EXTREME care. One subtle error, or incorrect ordering of things and you get very weird bugs which might only appear during stress and certain specific types of stress or sequence of things. It will not be done for 3.2.9. Normally it should be for 3.3*, but I'm willing to reconsider it for 3.2.10 or whatever it will be. We'll see. We're talking about 2011+ then anyway. And yes you can post another thousand comments here but I will not read them, the point is already clear :P EDIT: hah, I only saw just now that the view status was 'private', probably for the same reason as mentioned above (comments), it's now 'public' again. |
|
I have attached unreal-poll.diff which I made a few weeks back that appears to work fine. I have had it running on various testnets and I can confirm it builds and works on Windows. |
|
I think visiting several solutions is required here. If I use FreeBSD and can have a far superior I/O socket engine (kqueue), I shouldn't have to suffer because of windows (select). It shouldn't be hard to control windows gets select() or string_and_2cans() as far as I'm concerned. Perhaps via defines (config.h) or default to select so those that build their own win32 build or uses the precompiled version automagiclly work, then those of us that compile our own IRCd can select our favorite engine. |
|
Little update, I've been running Adam's poll patch on 3.2.8.1 on a very low traffic testnet and so far no problems. |
|
hi, for the number of clients on a typical unreal network, poll has less overhead than kqueue/epoll. yes, kqueue and epoll are O(1), but the iteration time is actually larger so you don't start to see the benefits unless you are dealing with 1000-2000 local connections. i'm not convinced that there are any networks out there with 2000 users on their servers running unreal except for swiftirc and they don't really count because their stuff is modified anyway. Adam's patch seems sane enough for inclusion into Unreal 3.2.10, we can revisit the I/O engine problem later, but using compiler-assisted intrinsic profiling (e.g. IBM Purify) shows that the socket code is actually one of the least of our problems as far as CPU cycles being gobbled up goes. katsklaw: could you test this patch more thoroughly, like using clones-x or some crap on the testnet to flood it? |
|
nenotopia, sure I'll look into it. I do work a very busy schedule and may take me a bit to get back with you on it. RE I/O engine "problem". I personally don't see it as a problem per se, just would like more options. Scalability and stuff. It doesn't make sense to me to stick with 1 socket engine, even though 99%+ of the networks won't need more than select. I also agree that I/O resource usage isn't very high on the list either. You should run Unreal through valgrind some time ;P |
|
So far I've loaded 490 clones, all joined # then parted and quit. then I loaded 220 clones, joined #, and text flooded the channel with PRIVMSG and NOTICE from all of them. I can test more later. |
|
katslaw: there is another argument in favor of not offering several choices, which is that if you choose one good one which pretty much all *NIX support, like poll, you know that one is well tested (or gets well tested). When you have multiple options, especially ones (which is suggested) are only sometimes useful, you risk that they contain some [serious] bugs which go undetected for a long time. Not saying that is a argument which defeats all others, just saying it is one. There are more networks running 2K or 4K servers... or at least there used to be. To me, the whole point of changing the I/O engine is making it easier to run such servers, without having to adjust system headers and silly things like that. I'm less convinced about the big advantage wrt CPU which some people used to advocate (doesn't mean there isn't any). |
|
well sure, that's true. nenotopia has a valid point when they said there aren't any large nets that run unreal to start with. Given IRC's current population issue or lack there of, it may be folly to even worry about it. I personally can debate both sides of this topic. I can see the advantages of having kqueue on freebsd for example for all of it's benefits, but on the other hand. How many nets really need more than select? The same can be said for other technologies we use too such as ziplinks, higher expense at low volume where gains aren't apparent until large amounts of data is sent. I can take that full circle, if a net is large enough to need ziplinks, they can likely benefit form poll/kqueue as well. ;) |
|
Like I said, I disagree. I've several customers that run a few K of clients per server. They just aren't (fully) public networks. Also, if you don't care about such setups, why would you support is? I'm also a bit fed up with those 'IRC is teh dieing people'. Sorry ;P Anything else I'd say, I would be repeating my previous post. Oh wait, why wouldn't I: To me, the whole point of changing the I/O engine is making it easier to run such servers, without having to adjust system headers and silly things like that. I'm less convinced about the big advantage wrt CPU which some people used to advocate (doesn't mean there isn't any). (Oh and let's not deviate the topic to ziplinks...) EOF |
|
"i'm not convinced that there are any networks out there with 2000 users on their servers running unreal except for swiftirc and they don't really count because their stuff is modified anyway." chat02.ustream.tv (4050) |-chat01.ustream.tv (505) | |-chat07.ustream.tv (3649) | |-chat03.ustream.tv (4027) | |-chat06.ustream.tv (3976) | |-chat05.ustream.tv (3878) | |-chat08.ustream.tv (3855) | |-chat04.ustream.tv (4027) | `-chat09.ustream.tv (3822) `-chat10.ustream.tv (0) Unreal3.2.8.1. chat02.ustream.tv FhiXOoEMRs3 [*=2309] There are 31447 users and 416 invisible on 10 servers I wouldn't say they're completely nonexistant... |
|
There are 41 users and 1282 invisible on 41 servers Current Local Users: 310 Max: 678 Current Global Users: 1323 Max: 71985 |
|
I did some testing with the patch created by Adam- using 3.2.9-RC1 (with an additional modification in src/s_bsd.c to be able to start the ircd with more than 1024 connections): #ifndef _WIN32 +# ifndef USE_POLL if (MAXCONNECTIONS > FD_SETSIZE) { fprintf(stderr, "MAXCONNECTIONS (%d) is higher than FD_SETSIZE (%d)\n", MAXCONNECTIONS, FD_SETSIZE); fprintf(stderr, "You might need to recompile the IRCd, or if you're running Linux, read the release notes\n"); exit(-1); } +# endif #endif It's starts up okay, accepts connections, works fine until 1118 connects is reached then it crashes. I had it setup to accept upto 5000 connections. The crash dump in gdb reports the following: Core was generated by `/home/srvim/Unreal3.2.9-rc1/src/ircd'. Program terminated with signal 11, Segmentation fault. #0 0x0806339b in main (argc=Cannot access memory at address 0xbfc1ff30 ) at ircd.c:1893 1893 if (timeofday > nextfdlistcheck) (gdb) bt full #0 0x0806339b in main (argc=Cannot access memory at address 0xbfc1ff30 ) at ircd.c:1893 uid = Cannot access memory at address 0xbfc1ff34 (gdb) quit Hope this is useful to someone, if more information is required let me know. |
|
woofwood, I've attached a second patch with your change and some others, including preventing it from crashing if somehow there are more fd's added to the poll() engine than is supposed to be available (though I'm not sure if that's possible currently?). I also connected over 4000 bots to Unreal patched with patch #1 and I had no problems with it. |
|
I'll take another look at this in the near future. If it looks ok, then I'll put it in the repository so it can receive a lot more testing. |
|
Thanks a lot for the patch, Adam! - Added patch from Adam for poll() support (0001245). - Various changes/fixes/enhancements to poll patch - UnrealIRCd now supports poll() instead of select(). There are some minor speed benefits if you have more than 1K or 2K clients, however the main noticeable difference is that on Linux you can now easily enter a higher maximum connection count than 1024 in ./Config, without having to edit system header files. Of course, you still need to be allowed to use the # of sockets (type 'ulimit -n' on the shell). Support for this is experimental at this stage, but enabled by default so it can receive all the testing it deserves. If all goes well, it will be the default for 3.2.10. Stress testing is very much welcomed! http://hg.unrealircd.com/hg/unreal/rev/869790963c0f http://hg.unrealircd.com/hg/unreal/rev/eb03f48654b8 http://hg.unrealircd.com/hg/unreal/rev/413d6273818c http://hg.unrealircd.com/hg/unreal/rev/699cb7949b7a http://hg.unrealircd.com/hg/unreal/rev/f1f6c992c228 http://hg.unrealircd.com/hg/unreal/rev/a41fdf31a5da EDIT: Oh, perhaps I should mention this: now we have poll I think we're done with this bug id. With regards to other enhancements like kqueue, epoll, and so on: I *really* don't feel they are worth the effort involved. |
Date Modified | Username | Field | Change |
---|---|---|---|
2003-09-13 11:13 | thilo | New Issue | |
2003-09-13 18:48 | syzop | Note Added: 0003643 | |
2003-09-13 19:35 | thilo | Note Added: 0003645 | |
2003-09-13 20:19 | syzop | Note Added: 0003646 | |
2003-09-13 23:38 | thilo | Note Added: 0003647 | |
2004-06-29 10:55 | syzop | Note Added: 0006808 | |
2004-06-29 10:55 | syzop | View Status | public => private |
2004-06-29 10:55 | syzop | Description Updated | |
2004-06-29 16:39 | syzop | Reporter | thilo => syzop |
2004-06-29 16:39 | syzop | Additional Information Updated | |
2004-06-29 18:29 | syzop | Summary | Make better use of poll() => [3.3] I/O engine / poll() support |
2004-06-30 23:53 |
|
Note Added: 0006820 | |
2004-07-01 10:10 | syzop | Note Added: 0006821 | |
2007-04-27 06:21 |
|
Status | new => feedback |
2007-05-08 08:46 | syzop | Note Added: 0014046 | |
2010-06-21 18:46 | syzop | Relationship added | has duplicate 0003361 |
2010-06-21 18:52 | syzop | Note Added: 0016128 | |
2010-06-21 18:52 | syzop | QA | => Not touched yet by developer |
2010-06-21 18:52 | syzop | U4: Need for upstream patch | => No need for upstream InspIRCd patch |
2010-06-21 18:52 | syzop | U4: Upstream notification of bug | => Not decided |
2010-06-21 18:52 | syzop | U4: Contributor working on this | => None |
2010-06-21 18:52 | syzop | Status | feedback => confirmed |
2010-06-21 18:53 | syzop | View Status | private => public |
2010-06-21 18:54 | syzop | Note Edited: 0016128 | |
2010-07-02 10:09 | syzop | Relationship added | child of 0003915 |
2010-12-15 18:27 | Adam- | File Added: unreal-poll.diff | |
2010-12-15 18:29 | Adam- | Note Added: 0016461 | |
2010-12-15 20:30 | katsklaw | Note Added: 0016462 | |
2010-12-16 17:06 | katsklaw | Note Edited: 0016462 | |
2010-12-18 05:29 | katsklaw | Note Added: 0016467 | |
2010-12-18 09:09 |
|
Note Added: 0016471 | |
2010-12-19 02:22 | katsklaw | Note Added: 0016472 | |
2010-12-19 03:57 | katsklaw | Note Added: 0016474 | |
2010-12-19 13:36 | syzop | Note Added: 0016476 | |
2010-12-19 14:57 | katsklaw | Note Added: 0016477 | |
2010-12-19 15:09 | syzop | Note Added: 0016478 | |
2010-12-20 05:26 | ohnobinki | File Added: unreal-1245-poll-configure.ac.patch | |
2011-03-11 02:32 | CoreDuo | Note Added: 0016618 | |
2011-03-11 15:07 | Bock | Note Added: 0016619 | |
2011-05-18 23:11 | wolfwood | Note Added: 0016654 | |
2011-05-27 19:40 | Adam- | File Added: unreal-poll-2.diff | |
2011-05-27 19:43 | Adam- | Note Added: 0016655 | |
2012-02-08 11:26 | syzop | Note Added: 0016904 | |
2012-02-08 11:28 | syzop | Summary | [3.3] I/O engine / poll() support => I/O engine / poll() support |
2012-02-26 18:42 | syzop | Note Added: 0016918 | |
2012-02-26 18:42 | syzop | Status | confirmed => resolved |
2012-02-26 18:42 | syzop | Fixed in Version | => 3.2.10-rc1 |
2012-02-26 18:42 | syzop | Resolution | open => fixed |
2012-02-26 18:42 | syzop | Assigned To | => syzop |
2012-02-26 18:47 | syzop | Note Edited: 0016918 | |
2012-02-26 18:49 | syzop | Note Edited: 0016918 |