View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0002259 | unreal | ircd | public | 2004-12-28 16:49 | 2013-01-09 10:18 |
Reporter | Assigned To | syzop | |||
Priority | normal | Severity | feature | Reproducibility | N/A |
Status | closed | Resolution | no change required | ||
Summary | 0002259: TRE extension for mIRC code stripping | ||||
Description | I've been playing around with TRE (I'm submitting a bunch of patches for it). My main reason for doing this was to get a little familiar with the code. My idea is basically one that will solve the badword stripping results in color codes disappearing and such. How will it does this? Well, we won't strip the color codes. Rather, we will "ignore" them in the regex engine. So for example, we encounter a \2 (bold), we just ++ past it (in a sense, it always matches). Of course, there might be instances where we do want to match these characters. For example, a drone that uses a realname of "foo\2bar" so it will be dynamically controllable, "(?-C)foo\2bar" and now the Code stripping is disabled so it only matches if the \2 is there. By default we'd use a new flag, REG_CODEIGNORE and the (?-C) construct would turn it off. That way it is backward compatible, and also gives the user more control. I don't think this will be too hard to do, however I'm not yet 100% sure I'm able to do it. I do understand how to add new modifiers (things like (?i)), since one of the patches I'm submitting adds one. Unfortunately, all the patches I'm making only deal with regcomp() not regexec(), so I will need to do more learning before I'm sure this is possible. | ||||
Tags | No tags attached. | ||||
3rd party modules | |||||
|
Just out of curiosity, does this mean it just skips the color and pretend it wasn't there (but possibly include it in things like captures), or regardless of the character class, a color always matches? I guess in other words, with this option, would 'e' be treated as 'e[\1\2\3\4\17\37\16\33]*' or '[e\1\2\3\4\17\37\16\33]'? Reason I ask is because what you said "(in a sense, it always matches)" could suggest it goes either way (though mentioning ++ would suggest the former). Oh, and would CTCP characters be effected by this at all (even though CTCP isn't a color/format code... it's still in the nonprintable ASCII range)? *edit* Oh and I know TRE doesn't support the \### octal character notation (unless it does and no one told me ;p ). Also, forgot the ESC character is considered by +c to be a "color code". */edit* |
|
[quote]Reason I ask is because what you said "(in a sense, it always matches)" could suggest it goes either way (though mentioning ++ would suggest the former).[/quote] Well I'm thinking, 100% ignored. Like, the color characters become "zero-width." [quote]Oh, and would CTCP characters be effected by this at all[/quote] No. It will only find and skip color and formatting codes. [quote]*edit* Oh and I know TRE doesn't support the \### octal character notation[/quote] Well it supports \x## where ## is hex. *Edit: Better example, regex: "([a-z])" text: "\2a\2b\2c" matches \1 = "a\2b\2c" |
|
So basically it's almost as if every character class had a [\x02\x03\x04\x0F\x1F\x0E\x1B]* after it (as far as making the regular expression goes, anyway)? Except you don't have to type out that whole mess every time ;-) . Nice. Yeah yeah I know more accurate to say it pretends they don't even exist but :) . (And actually, it'd be more like: ([\x02\x0F\x1F\x0E\x1B]|\x03([0-9][0-9]?(,[0-9][0-9]?))?|\x04[0-9A-Fa-f]{6}(,[0-9A-Fa-f]{6})?)* - strip the codes, you need to strip the args for mirc/rbg color too :) ) |
|
[quote](And actually, it'd be more like: ([\x02\x0F\x1F\x0E\x1B]|\x03([0-9][0-9]?(,[0-9][0-9]?))?|\x04[0-9A-Fa-f]{6}(,[0-9A-Fa-f]{6})?)* - strip the codes, you need to strip the args for mirc/rbg color too :) )[/quote] Yeah, pretty much, though I'd probably just use [:xdigit:] ;). But it should be more efficient. I don't intend to actually make it "expand" to that. I intend to hardcode it into the parser. Like if cflag & REG_IGNORECODE && *curchar == '\2') curchar++; So the size of the regex won't grow as a result of this. |
|
Of course not :) . I was mostly thinking appearance, not actual implementation. Of course it would be easier to just ++ past the code + args :P . (On the other hand, it might take a bit of work off on your part... :P) |
|
As long as it's fast (and doesn't crash) ;). Obviously, it can only become slower than the current implementation.. since stripping (color) codes once vs doing it every regex is impossible without any performance penalty. That said, since it's (very) simple, there shouldn't be any noticable slowdown[1].. and if implemented properly, I would in fact be happy with this feature.. it's clean, and it's useful (or even required) for some (spamfilter) cases :). [1] Comparing bytes that are in the L1 cache (or will become anyway) and increasing a counter (pointer) are very fast instructions ;p |
|
*bump* Is anyone going to work on this? |
|
in my opinion, TRE sucks bigtime. i think unrealircd would be MUCH more well suited to use PCRE, its much faster, and can do much more powerful regular expresions |
|
I agree with djGrrr. Also as i can see from this one http://bugs.unrealircd.org/view.php?id=2887 the TRE author is (almost) not working on it anymore (or not?). |
|
We have included PCRE in 3.3 now. |
|
scratched (the TRE extension for mIRC color stripping). |
Date Modified | Username | Field | Change |
---|---|---|---|
2004-12-28 16:49 |
|
New Issue | |
2004-12-29 20:44 | aquanight | Note Added: 0008688 | |
2004-12-29 20:45 | aquanight | Note Edited: 0008688 | |
2004-12-29 20:49 |
|
Note Added: 0008689 | |
2004-12-29 20:52 |
|
Note Edited: 0008689 | |
2004-12-29 20:53 |
|
Note Edited: 0008689 | |
2004-12-29 20:56 | aquanight | Note Added: 0008690 | |
2004-12-29 21:02 |
|
Note Added: 0008691 | |
2005-01-03 00:35 | aquanight | Note Added: 0008695 | |
2005-01-03 11:35 | syzop | Note Added: 0008696 | |
2007-04-18 12:46 | Stealth | Note Added: 0013521 | |
2007-04-18 17:09 | djGrrr | Note Added: 0013537 | |
2007-04-18 18:12 | vonitsanet | Note Added: 0013545 | |
2007-06-11 13:11 |
|
Assigned To | codemastr => |
2007-06-21 14:22 |
|
Note Added: 0014397 | |
2013-01-09 10:17 | syzop | Note Added: 0017318 | |
2013-01-09 10:17 | syzop | Status | assigned => closed |
2013-01-09 10:18 | syzop | Assigned To | => syzop |
2013-01-09 10:18 | syzop | Resolution | open => fixed |
2013-01-09 10:18 | syzop | Resolution | fixed => no change required |