View Issue Details

IDProjectCategoryView StatusLast Update
0000920unrealircdpublic2004-12-26 15:30
ReporterRockoAssigned Tocodemastr 
PrioritynormalSeverityfeatureReproducibilityalways
Status resolvedResolutionfixed 
PlatformOSDebian WoodyOS Version3.0
Product Version3.2-beta17 
Target VersionFixed in Version3.2.3 
Summary0000920: Regex documentation
DescriptionI don´t know, if wildcards are allowed, but in the beta15 release, there is an entry "badword channel { word "*fuck*"; };", so I think it is. (btw. there is an entry with only fuck, so it exist twice!)

And when I am using an entry like:

badword channel { word "m*se"; };

and say then: sei
it will be <censored>i
and thats not right, because there isn´t a "m" in front of sei.
TagsNo tags attached.
3rd party modules

Activities

syzop

2003-04-26 22:10

administrator   ~0002507

Just for the record: this bug is unrelated to fast badwords replace since it's recognized as a regex (c R m*se <censored>). I'm a regex-n00b so I dunnow if this is good or bad... Someone reported a similar issue (at IRC) about "irc.*.*" which caused "hi irc.blah.com is nice" was replaced to "hi irc.repl.aced" (the part after it got dropped)...

Schutzgeist

2003-04-27 23:43

reporter   ~0002524

Well
If I set

badword channel { word "lame"; replace "leet";};
And someone writes lame, User will see "leet" - Thats okay.

If I set
badword channel { word "*l*a*m*e*"; replace "leet";};
And someone says lame - everyone see "Lame" so
it doesn´t work.

BUT e is then a BarChar, coz when I type
Unreal it will put out:
Unrleetal

Another Problem is when * is a bad character.
When U set
badword channel { word "l*a*m*e"; replace "leet";};
badword channel { word "*t*e*s*t*"; replace "<censored word>";};

And someone types just a _star_ like "*"
The outout will be:
<cleetnsorleetd word>

I think Rocko typed the same;) but I just wanted to show some more examples;)
The new way of the BadWordConfig is very interesting but very difficult to handle.
With wildcards U can do a lot of mistakes.
Without wildcards the BadWordList will lose it worth.

codemastr

2003-04-28 04:18

reporter   ~0002528

This is part your problem, part Unreal's problem. In regex, the * operator does NOT mean the same thing as it does in a wildcard or glob expression. It does NOT match "any characters" it matches any of the previous character, for example, "t*st" says "match 0 or more t's followed by st". What you probably want is ".*" the "." says any character, therefore ".*" says "match 0 or more of any character". But this still produces some odd results, in some rare circumstances. After beta16 I will be making Unreal use a new regex library that supports "non-greedy repeat-operations" which will make it function 100% as expected assuming you know the correct syntax. Additionally the new library is much faster than the current one, so it has more features, and it is faster, so it is obviously a good choice. Perhaps I'll consider writing up a simple document on some of the basic features of regex...

AngryWolf

2003-06-29 19:37

reporter   ~0003131

Last edited: 2003-07-26 13:11

I don't think it is Unreal's problem. Fast badword replace was designed for "blah", "*blah", "blah*" and "*blah", where blah is a string of alphabetic characters. Rocko should better use "m[[:alnum:]]+se", because "m*se" matches any number of occurrences of "m" followed by "se".

In addition, I don't understand why Syzop's fast badword replace system is not documented in unreal32docs.html. Instead of this, badword::word is only mentioned as a "a simple word" (or a regex), meaning that no wildcards are accepted, but it is not true.

By the way, what about, for example, http://www.pcre.org/ ? In my opinion, perl-style regular expressions are easy to use. (Just an idea.) Or to satisfy users, making a not-as-fast-replace-system-as-fast-badword-replace which supports *b*l*a*h*? :-)

[Corrected a mistake: s/alphanumeric/alphabetic/, sorry.]

edited on: 07-26-03 13:11

codemastr

2003-06-30 20:30

reporter   ~0003139

I don't like PCRE, it is slow. I'm going to be using TRE which is lightning fast and supports some rather advanced features. One (that no other regex lib supports) is approximate matching. That is a very nice feature for badwords.

AngryWolf

2003-07-06 06:18

reporter   ~0003173

If you still intend to write that tutorial, please, could you illustrate the features of regexes by clear examples in it? Surely the official documentation, which is available at http://kouli.iki.fi/~vlaurika/tre/syntax.html, is not widely understandable, particularly for newbies. Just like as it is described at http://www.zytrax.com/tech/web/regex.htm, but specialized for TRE, that would be probably fairly enough to also understand the way approximate matching works.

AngryWolf

2003-07-26 13:10

reporter   ~0003338

Can I ask a question? If the fast badword replace system is designed to accept all alphabetical characters, but nothing more, why does it allow an opening curly bracket? (character: "{", code: 123)

codemastr

2004-12-26 15:30

reporter   ~0008673

Done in .211

Issue History

Date Modified Username Field Change
2003-04-26 20:35 Rocko New Issue
2003-04-26 22:10 syzop Note Added: 0002507
2003-04-27 23:43 Schutzgeist Note Added: 0002524
2003-04-28 04:18 codemastr Note Added: 0002528
2003-06-29 19:37 AngryWolf Note Added: 0003131
2003-06-29 19:39 AngryWolf Note Edited: 0003131
2003-06-30 13:24 syzop Severity minor => feature
2003-06-30 13:24 syzop Category => ircd
2003-06-30 13:24 syzop Product Version 3.2-beta15 => 3.2-beta17
2003-06-30 13:24 syzop Summary Bugs with matching badwords when wildcards are used. => badwords wildcard confusion / TRE
2003-06-30 20:30 codemastr Note Added: 0003139
2003-07-06 06:18 AngryWolf Note Added: 0003173
2003-07-26 13:10 AngryWolf Note Added: 0003338
2003-07-26 13:11 AngryWolf Note Edited: 0003131
2004-01-18 01:51 codemastr Status new => assigned
2004-01-18 01:51 codemastr Assigned To => codemastr
2004-12-26 15:28 codemastr Summary badwords wildcard confusion / TRE => Regex documentation
2004-12-26 15:30 codemastr Status assigned => resolved
2004-12-26 15:30 codemastr Fixed in Version => 3.2.3
2004-12-26 15:30 codemastr Resolution open => fixed
2004-12-26 15:30 codemastr Note Added: 0008673