View Issue Details

IDProjectCategoryView StatusLast Update
0001245unrealircdpublic2012-02-26 18:43
Reportersyzop Assigned Tosyzop  
PrioritynormalSeverityfeatureReproducibilityalways
Status resolvedResolutionfixed 
Fixed in Version3.2.10-rc1 
Summary0001245: I/O engine / poll() support
DescriptionI'd really like to see the poll() system call (or the new system calls introduced in linux 2.6) to be put to good use. Because, as far as I have understood this whole matter and the way it is pointed out in the documentation, the main problem is that only fd_sets are supported, and the maximum fd_set size supported and hard-compiled into the linux kernel by default is 1024. This limit of 1024 file descriptors is not being imposed on poll() system calls, or the newer calls introduced into kernel 2.6.
Furthermore - the newer system calls are way more performant than select() or poll() itself.
This would make it possible to accept way more connections than just 1024, and in these times this limit is really pretty low.
Additional Informationoriginal report / text @ description is by thilo, not by Syzop [!]. I just changed the 'reporter' to get rid of annoying comments.
TagsNo tags attached.
Attached Files
unreal-poll.diff (11,446 bytes)
unreal-poll-2.diff (11,899 bytes)
3rd party modules

Relationships

has duplicate 0003361 closed Better socket handling for *nix 
child of 0003915 resolvedsyzop Unreal3.2.10 TODO 
child of 0004301 resolvedsyzop Unreal3.2.10 TODO 

Activities

syzop

2003-09-13 18:48

administrator   ~0003643

[about fd limits]
Well, most of the time the limit is enforced bu ulimits... Type "ulimit -n" at a random (default) *NIX installation, that will usually say '1024' (Linux, FreeBSD, ..). However, it's true that at Linux you won't have to do another step (besides increasing the ulimit): the editting of the header files.

[about performance]
They say poll is more CPU friendly, I'm sure that's true in theory (I know how it works), I'm not so sure if it matters that much on an average server with X00 clients however..

[in short]
poll() support would be nice, I certainly agree.. an "IO engine" rewrite is on TODO, I assume support for poll will be added then, I'm not so sure if it will be added in the meantime :/.

thilo

2003-09-13 19:35

reporter   ~0003645

####
Well, most of the time the limit is enforced bu ulimits... Type "ulimit -n" at a random (default) *NIX installation, that will usually say '1024' (Linux, FreeBSD, ..).
####

This fact is known to me. On machines with root access I can modify this limit. Thus, ulimits are of no concern in this discussion.

####
However, it's true that at Linux you won't have to do another step (besides increasing the ulimit): the editting of the header files.
####

Quite on the contrary: as far as goes to my knowledge, you have to recompile the whole kernel with the new FD_SETSIZE value in order to get the changes into effect. That this cannot be a solution, should be clear in my opinion.

####
They say poll is more CPU friendly, I'm sure that's true in theory (I know how it works), I'm not so sure if it matters that much on an average server with X00 clients however..
####

Well, I really think it does. Before each select, you have to add new file descriptors to your set, whereas in poll you can use the old array. Building up the FD_SET once is not really resource intensive, but imagine if you have more than 1000 messages per seconds, and you have to build the new FD_SET up at least 400 times a second ... it will get noticable!

syzop

2003-09-13 20:19

administrator   ~0003646

Well your 2nd and 3rd point are both incorrect, but anyway I think it's clear that we both would like poll() support.

thilo

2003-09-13 23:38

reporter   ~0003647

you're right ;)
I'll shut up already .. :P

syzop

2004-06-29 10:55

administrator   ~0006808

=> 3.3*, so not within a year or so.

codemastr

2004-06-30 23:53

reporter   ~0006820

I've been trying to conceive how difficult it would be to write a new I/O engine for Unreal. After looking at the code and trying to see what would be necessary, I had a nice big headache. If we were to do it from scratch, with just 2 people working on it, it would probably take us a few years to complete and work out most of the bugs.

Instead, I think the best method would be to adapt the new I/O engine from Hybrid. It's nice, it's semi-module friendly, and it would mean a bit less work. We'd just have to adapt it for Unreal, and add our own little features (e.g., SSL support). I haven't looked at the Hybrid I/O system too much, but I can't imagine it being too difficult to implement in Unreal.

I'm thinking the I/O engine should be the major feature of 3.3. To users, it won't seem like much of a change, but on the inside, Unreal will be vastly different, and much cleaner.

Syzop, any comments on this?

syzop

2004-07-01 10:10

administrator   ~0006821

I totally agree.
Just as you say.. I'm sure too it would take ages to code, and if finished it would contain tons of bugs (=most worrying ;p)... And these are NOT easy to trace bugs :p.

syzop

2007-05-08 08:46

administrator   ~0014046

Anyone interested in this, should also check http://dev.unrealircd.com/wiki/io_engine
In short it comes down to porting the hybrid I/O to Unreal.

Darkelf did some early work on this, which was committed to 3.3*, but nothing substantial. So, there's still a lot of work left (pretty much all work).

Of course, this job is huge and even though it's copying existing code (and porting it), it requires extensive thought and knowledge of I/O subsystems (theirs and ours).

I certainly hope someone will come forward to do this ;)

syzop

2010-06-21 18:52

administrator   ~0016128

Last edited: 2010-06-21 18:54

I'm willing to rethink about this for another release (I mean seriously reconsider it), but it's very very dangerous to change any part of the I/O engine, so it requires EXTREME care.
One subtle error, or incorrect ordering of things and you get very weird bugs which might only appear during stress and certain specific types of stress or sequence of things.

It will not be done for 3.2.9.

Normally it should be for 3.3*, but I'm willing to reconsider it for 3.2.10 or whatever it will be. We'll see. We're talking about 2011+ then anyway.

And yes you can post another thousand comments here but I will not read them, the point is already clear :P

EDIT: hah, I only saw just now that the view status was 'private', probably for the same reason as mentioned above (comments), it's now 'public' again.

Adam-

2010-12-15 18:29

reporter   ~0016461

I have attached unreal-poll.diff which I made a few weeks back that appears to work fine. I have had it running on various testnets and I can confirm it builds and works on Windows.

katsklaw

2010-12-15 20:30

reporter   ~0016462

Last edited: 2010-12-16 17:06

I think visiting several solutions is required here. If I use FreeBSD and can have a far superior I/O socket engine (kqueue), I shouldn't have to suffer because of windows (select).

It shouldn't be hard to control windows gets select() or string_and_2cans() as far as I'm concerned. Perhaps via defines (config.h) or default to select so those that build their own win32 build or uses the precompiled version automagiclly work, then those of us that compile our own IRCd can select our favorite engine.

katsklaw

2010-12-18 05:29

reporter   ~0016467

Little update, I've been running Adam's poll patch on 3.2.8.1 on a very low traffic testnet and so far no problems.

nenotopia

2010-12-18 09:09

reporter   ~0016471

hi,

for the number of clients on a typical unreal network, poll has less overhead than kqueue/epoll. yes, kqueue and epoll are O(1), but the iteration time is actually larger so you don't start to see the benefits unless you are dealing with 1000-2000 local connections.

i'm not convinced that there are any networks out there with 2000 users on their servers running unreal except for swiftirc and they don't really count because their stuff is modified anyway.

Adam's patch seems sane enough for inclusion into Unreal 3.2.10, we can revisit the I/O engine problem later, but using compiler-assisted intrinsic profiling (e.g. IBM Purify) shows that the socket code is actually one of the least of our problems as far as CPU cycles being gobbled up goes.

katsklaw: could you test this patch more thoroughly, like using clones-x or some crap on the testnet to flood it?

katsklaw

2010-12-19 02:22

reporter   ~0016472

nenotopia, sure I'll look into it. I do work a very busy schedule and may take me a bit to get back with you on it.

RE I/O engine "problem". I personally don't see it as a problem per se, just would like more options. Scalability and stuff. It doesn't make sense to me to stick with 1 socket engine, even though 99%+ of the networks won't need more than select. I also agree that I/O resource usage isn't very high on the list either. You should run Unreal through valgrind some time ;P

katsklaw

2010-12-19 03:57

reporter   ~0016474

So far I've loaded 490 clones, all joined # then parted and quit. then I loaded 220 clones, joined #, and text flooded the channel with PRIVMSG and NOTICE from all of them.

I can test more later.

syzop

2010-12-19 13:36

administrator   ~0016476

katslaw: there is another argument in favor of not offering several choices, which is that if you choose one good one which pretty much all *NIX support, like poll, you know that one is well tested (or gets well tested). When you have multiple options, especially ones (which is suggested) are only sometimes useful, you risk that they contain some [serious] bugs which go undetected for a long time.
Not saying that is a argument which defeats all others, just saying it is one.

There are more networks running 2K or 4K servers... or at least there used to be. To me, the whole point of changing the I/O engine is making it easier to run such servers, without having to adjust system headers and silly things like that. I'm less convinced about the big advantage wrt CPU which some people used to advocate (doesn't mean there isn't any).

katsklaw

2010-12-19 14:57

reporter   ~0016477

well sure, that's true. nenotopia has a valid point when they said there aren't any large nets that run unreal to start with. Given IRC's current population issue or lack there of, it may be folly to even worry about it. I personally can debate both sides of this topic. I can see the advantages of having kqueue on freebsd for example for all of it's benefits, but on the other hand. How many nets really need more than select? The same can be said for other technologies we use too such as ziplinks, higher expense at low volume where gains aren't apparent until large amounts of data is sent.

I can take that full circle, if a net is large enough to need ziplinks, they can likely benefit form poll/kqueue as well. ;)

syzop

2010-12-19 15:09

administrator   ~0016478

Like I said, I disagree. I've several customers that run a few K of clients per server. They just aren't (fully) public networks.
Also, if you don't care about such setups, why would you support is?
I'm also a bit fed up with those 'IRC is teh dieing people'.
Sorry ;P

Anything else I'd say, I would be repeating my previous post. Oh wait, why wouldn't I:
To me, the whole point of changing the I/O engine is making it easier to run such servers, without having to adjust system headers and silly things like that. I'm less convinced about the big advantage wrt CPU which some people used to advocate (doesn't mean there isn't any).

(Oh and let's not deviate the topic to ziplinks...)

EOF

CoreDuo

2011-03-11 02:32

reporter   ~0016618

"i'm not convinced that there are any networks out there with 2000 users on their servers running unreal except for swiftirc and they don't really count because their stuff is modified anyway."

chat02.ustream.tv (4050)
|-chat01.ustream.tv (505)
| |-chat07.ustream.tv (3649)
| |-chat03.ustream.tv (4027)
| |-chat06.ustream.tv (3976)
| |-chat05.ustream.tv (3878)
| |-chat08.ustream.tv (3855)
| |-chat04.ustream.tv (4027)
| `-chat09.ustream.tv (3822)
`-chat10.ustream.tv (0)

Unreal3.2.8.1. chat02.ustream.tv FhiXOoEMRs3 [*=2309]

There are 31447 users and 416 invisible on 10 servers

I wouldn't say they're completely nonexistant...

Bock

2011-03-11 15:07

reporter   ~0016619

There are 41 users and 1282 invisible on 41 servers

 Current Local Users: 310 Max: 678
 Current Global Users: 1323 Max: 71985

wolfwood

2011-05-18 23:11

reporter   ~0016654

I did some testing with the patch created by Adam- using 3.2.9-RC1 (with an additional modification in src/s_bsd.c to be able to start the ircd with more than 1024 connections):

 #ifndef _WIN32
+# ifndef USE_POLL
        if (MAXCONNECTIONS > FD_SETSIZE)
        {
                fprintf(stderr, "MAXCONNECTIONS (%d) is higher than FD_SETSIZE (%d)\n", MAXCONNECTIONS, FD_SETSIZE);
                fprintf(stderr, "You might need to recompile the IRCd, or if you're running Linux, read the release notes\n");
                 exit(-1);
        }
+# endif
 #endif

It's starts up okay, accepts connections, works fine until 1118 connects is reached then it crashes. I had it setup to accept upto 5000 connections. The crash dump in gdb reports the following:

Core was generated by `/home/srvim/Unreal3.2.9-rc1/src/ircd'.
Program terminated with signal 11, Segmentation fault.
#0 0x0806339b in main (argc=Cannot access memory at address 0xbfc1ff30
) at ircd.c:1893
1893 if (timeofday > nextfdlistcheck)
(gdb) bt full
#0 0x0806339b in main (argc=Cannot access memory at address 0xbfc1ff30
) at ircd.c:1893
        uid = Cannot access memory at address 0xbfc1ff34
(gdb) quit

Hope this is useful to someone, if more information is required let me know.

Adam-

2011-05-27 19:43

reporter   ~0016655

woofwood, I've attached a second patch with your change and some others, including preventing it from crashing if somehow there are more fd's added to the poll() engine than is supposed to be available (though I'm not sure if that's possible currently?).

I also connected over 4000 bots to Unreal patched with patch #1 and I had no problems with it.

syzop

2012-02-08 11:26

administrator   ~0016904

I'll take another look at this in the near future. If it looks ok, then I'll put it in the repository so it can receive a lot more testing.

syzop

2012-02-26 18:42

administrator   ~0016918

Last edited: 2012-02-26 18:49

Thanks a lot for the patch, Adam!

- Added patch from Adam for poll() support (0001245).
- Various changes/fixes/enhancements to poll patch
- UnrealIRCd now supports poll() instead of select().
  There are some minor speed benefits if you have more than 1K or 2K
  clients, however the main noticeable difference is that on Linux you can
  now easily enter a higher maximum connection count than 1024 in ./Config,
  without having to edit system header files.
  Of course, you still need to be allowed to use the # of sockets (type
  'ulimit -n' on the shell).
  Support for this is experimental at this stage, but enabled by default
  so it can receive all the testing it deserves. If all goes well, it will
  be the default for 3.2.10.
  Stress testing is very much welcomed!

http://hg.unrealircd.com/hg/unreal/rev/869790963c0f
http://hg.unrealircd.com/hg/unreal/rev/eb03f48654b8
http://hg.unrealircd.com/hg/unreal/rev/413d6273818c
http://hg.unrealircd.com/hg/unreal/rev/699cb7949b7a
http://hg.unrealircd.com/hg/unreal/rev/f1f6c992c228
http://hg.unrealircd.com/hg/unreal/rev/a41fdf31a5da

EDIT: Oh, perhaps I should mention this: now we have poll I think we're done with this bug id.
With regards to other enhancements like kqueue, epoll, and so on: I *really* don't feel they are worth the effort involved.

Issue History

Date Modified Username Field Change
2003-09-13 11:13 thilo New Issue
2003-09-13 18:48 syzop Note Added: 0003643
2003-09-13 19:35 thilo Note Added: 0003645
2003-09-13 20:19 syzop Note Added: 0003646
2003-09-13 23:38 thilo Note Added: 0003647
2004-06-29 10:55 syzop Note Added: 0006808
2004-06-29 10:55 syzop View Status public => private
2004-06-29 10:55 syzop Description Updated
2004-06-29 16:39 syzop Reporter thilo => syzop
2004-06-29 16:39 syzop Additional Information Updated
2004-06-29 18:29 syzop Summary Make better use of poll() => [3.3] I/O engine / poll() support
2004-06-30 23:53 codemastr Note Added: 0006820
2004-07-01 10:10 syzop Note Added: 0006821
2007-04-27 06:21 stskeeps Status new => feedback
2007-05-08 08:46 syzop Note Added: 0014046
2010-06-21 18:46 syzop Relationship added has duplicate 0003361
2010-06-21 18:52 syzop Note Added: 0016128
2010-06-21 18:52 syzop QA => Not touched yet by developer
2010-06-21 18:52 syzop U4: Need for upstream patch => No need for upstream InspIRCd patch
2010-06-21 18:52 syzop U4: Upstream notification of bug => Not decided
2010-06-21 18:52 syzop U4: Contributor working on this => None
2010-06-21 18:52 syzop Status feedback => confirmed
2010-06-21 18:53 syzop View Status private => public
2010-06-21 18:54 syzop Note Edited: 0016128
2010-07-02 10:09 syzop Relationship added child of 0003915
2010-12-15 18:27 Adam- File Added: unreal-poll.diff
2010-12-15 18:29 Adam- Note Added: 0016461
2010-12-15 20:30 katsklaw Note Added: 0016462
2010-12-16 17:06 katsklaw Note Edited: 0016462
2010-12-18 05:29 katsklaw Note Added: 0016467
2010-12-18 09:09 nenotopia Note Added: 0016471
2010-12-19 02:22 katsklaw Note Added: 0016472
2010-12-19 03:57 katsklaw Note Added: 0016474
2010-12-19 13:36 syzop Note Added: 0016476
2010-12-19 14:57 katsklaw Note Added: 0016477
2010-12-19 15:09 syzop Note Added: 0016478
2010-12-20 05:26 ohnobinki File Added: unreal-1245-poll-configure.ac.patch
2011-03-11 02:32 CoreDuo Note Added: 0016618
2011-03-11 15:07 Bock Note Added: 0016619
2011-05-18 23:11 wolfwood Note Added: 0016654
2011-05-27 19:40 Adam- File Added: unreal-poll-2.diff
2011-05-27 19:43 Adam- Note Added: 0016655
2012-02-08 11:26 syzop Note Added: 0016904
2012-02-08 11:28 syzop Summary [3.3] I/O engine / poll() support => I/O engine / poll() support
2012-02-26 18:42 syzop Note Added: 0016918
2012-02-26 18:42 syzop Status confirmed => resolved
2012-02-26 18:42 syzop Fixed in Version => 3.2.10-rc1
2012-02-26 18:42 syzop Resolution open => fixed
2012-02-26 18:42 syzop Assigned To => syzop
2012-02-26 18:47 syzop Note Edited: 0016918
2012-02-26 18:49 syzop Note Edited: 0016918