View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0000920 | unreal | ircd | public | 2003-04-26 20:35 | 2004-12-26 15:30 |
Reporter | Rocko | Assigned To | |||
Priority | normal | Severity | feature | Reproducibility | always |
Status | resolved | Resolution | fixed | ||
OS | Debian Woody | OS Version | 3.0 | ||
Product Version | 3.2-beta17 | ||||
Fixed in Version | 3.2.3 | ||||
Summary | 0000920: Regex documentation | ||||
Description | I don´t know, if wildcards are allowed, but in the beta15 release, there is an entry "badword channel { word "*fuck*"; };", so I think it is. (btw. there is an entry with only fuck, so it exist twice!) And when I am using an entry like: badword channel { word "m*se"; }; and say then: sei it will be <censored>i and thats not right, because there isn´t a "m" in front of sei. | ||||
Tags | No tags attached. | ||||
3rd party modules | |||||
|
Just for the record: this bug is unrelated to fast badwords replace since it's recognized as a regex (c R m*se <censored>). I'm a regex-n00b so I dunnow if this is good or bad... Someone reported a similar issue (at IRC) about "irc.*.*" which caused "hi irc.blah.com is nice" was replaced to "hi irc.repl.aced" (the part after it got dropped)... |
|
Well If I set badword channel { word "lame"; replace "leet";}; And someone writes lame, User will see "leet" - Thats okay. If I set badword channel { word "*l*a*m*e*"; replace "leet";}; And someone says lame - everyone see "Lame" so it doesn´t work. BUT e is then a BarChar, coz when I type Unreal it will put out: Unrleetal Another Problem is when * is a bad character. When U set badword channel { word "l*a*m*e"; replace "leet";}; badword channel { word "*t*e*s*t*"; replace "<censored word>";}; And someone types just a _star_ like "*" The outout will be: <cleetnsorleetd word> I think Rocko typed the same;) but I just wanted to show some more examples;) The new way of the BadWordConfig is very interesting but very difficult to handle. With wildcards U can do a lot of mistakes. Without wildcards the BadWordList will lose it worth. |
|
This is part your problem, part Unreal's problem. In regex, the * operator does NOT mean the same thing as it does in a wildcard or glob expression. It does NOT match "any characters" it matches any of the previous character, for example, "t*st" says "match 0 or more t's followed by st". What you probably want is ".*" the "." says any character, therefore ".*" says "match 0 or more of any character". But this still produces some odd results, in some rare circumstances. After beta16 I will be making Unreal use a new regex library that supports "non-greedy repeat-operations" which will make it function 100% as expected assuming you know the correct syntax. Additionally the new library is much faster than the current one, so it has more features, and it is faster, so it is obviously a good choice. Perhaps I'll consider writing up a simple document on some of the basic features of regex... |
|
I don't think it is Unreal's problem. Fast badword replace was designed for "blah", "*blah", "blah*" and "*blah", where blah is a string of alphabetic characters. Rocko should better use "m[[:alnum:]]+se", because "m*se" matches any number of occurrences of "m" followed by "se". In addition, I don't understand why Syzop's fast badword replace system is not documented in unreal32docs.html. Instead of this, badword::word is only mentioned as a "a simple word" (or a regex), meaning that no wildcards are accepted, but it is not true. By the way, what about, for example, http://www.pcre.org/ ? In my opinion, perl-style regular expressions are easy to use. (Just an idea.) Or to satisfy users, making a not-as-fast-replace-system-as-fast-badword-replace which supports *b*l*a*h*? :-) [Corrected a mistake: s/alphanumeric/alphabetic/, sorry.] edited on: 07-26-03 13:11 |
|
I don't like PCRE, it is slow. I'm going to be using TRE which is lightning fast and supports some rather advanced features. One (that no other regex lib supports) is approximate matching. That is a very nice feature for badwords. |
|
If you still intend to write that tutorial, please, could you illustrate the features of regexes by clear examples in it? Surely the official documentation, which is available at http://kouli.iki.fi/~vlaurika/tre/syntax.html, is not widely understandable, particularly for newbies. Just like as it is described at http://www.zytrax.com/tech/web/regex.htm, but specialized for TRE, that would be probably fairly enough to also understand the way approximate matching works. |
|
Can I ask a question? If the fast badword replace system is designed to accept all alphabetical characters, but nothing more, why does it allow an opening curly bracket? (character: "{", code: 123) |
|
Done in .211 |
Date Modified | Username | Field | Change |
---|---|---|---|
2003-04-26 20:35 | Rocko | New Issue | |
2003-04-26 22:10 | syzop | Note Added: 0002507 | |
2003-04-27 23:43 | Schutzgeist | Note Added: 0002524 | |
2003-04-28 04:18 |
|
Note Added: 0002528 | |
2003-06-29 19:37 | AngryWolf | Note Added: 0003131 | |
2003-06-29 19:39 | AngryWolf | Note Edited: 0003131 | |
2003-06-30 13:24 | syzop | Severity | minor => feature |
2003-06-30 13:24 | syzop | Category | => ircd |
2003-06-30 13:24 | syzop | Product Version | 3.2-beta15 => 3.2-beta17 |
2003-06-30 13:24 | syzop | Summary | Bugs with matching badwords when wildcards are used. => badwords wildcard confusion / TRE |
2003-06-30 20:30 |
|
Note Added: 0003139 | |
2003-07-06 06:18 | AngryWolf | Note Added: 0003173 | |
2003-07-26 13:10 | AngryWolf | Note Added: 0003338 | |
2003-07-26 13:11 | AngryWolf | Note Edited: 0003131 | |
2004-01-18 01:51 |
|
Status | new => assigned |
2004-01-18 01:51 |
|
Assigned To | => codemastr |
2004-12-26 15:28 |
|
Summary | badwords wildcard confusion / TRE => Regex documentation |
2004-12-26 15:30 |
|
Status | assigned => resolved |
2004-12-26 15:30 |
|
Fixed in Version | => 3.2.3 |
2004-12-26 15:30 |
|
Resolution | open => fixed |
2004-12-26 15:30 |
|
Note Added: 0008673 |