View Issue Details

IDProjectCategoryView StatusLast Update
0006461unrealircdpublic2024-08-30 09:10
Reporteranhtribao Assigned To 
PrioritynormalSeverityminorReproducibilityalways
Status newResolutionopen 
Platformn/aOSn/aOS Versionn/a
Product Version6.1.7.1 
Summary0006461: Spamfilter: unicode U+200B does not match a matching regex
DescriptionThe U+200B is not matched by regex that should match it.

Here is the test with u+00a0, u+200a, u+200b, u+200c and u+feff

No spamfilter: (there 10x of announced chars between the A and Z)
The test client sends PRIVMSG to a nick 'Q', the nick 'Q' received the characters:
2024-08-27 17:13:05 DEBUG <<< :5TL8Z6X0C PRIVMSG 5TC00000Q :u+00a0: A          Z
2024-08-27 17:13:12 DEBUG <<< :5TL8Z6X0C PRIVMSG 5TC00000Q :u+200a: A          Z
2024-08-27 17:13:17 DEBUG <<< :5TL8Z6X0C PRIVMSG 5TC00000Q :u+200b: A​​​​​​​​​​B
2024-08-27 17:13:23 DEBUG <<< :5TL8Z6X0C PRIVMSG 5TC00000Q :u+200c: A‌‌‌‌‌‌‌‌‌‌Z
2024-08-27 17:13:29 DEBUG <<<  :5TL8Z6X0C PRIVMSG 5TC00000Q :u+feff: AZ


With spamfilter ( /spamfilter add -regex p block 3600 test_xxxx \N{u+xxxx} )
[17:15:31] F regex p block 0 25 3600 test_00a0 AshTray 0 0 \x{00a0}
[17:15:31] F regex p block 0 16 3600 test_200a AshTray 0 0 \x{200a}
[17:15:31] F regex p block 0 14 3600 test_200b AshTray  0 0 \x{200b}
[17:15:31] F regex p block 0 10 3600 test_200c AshTray  0 0 \x{200c}
[17:15:31] F regex p block 0 5 3600 test_feff AshTray  0 0 \x{feff}


Now the test client sends again the PRIVMSG (to a nick 'Q') and gets:
│17:15:43   local  -- │ Q Message blocked: test 00a0
│17:16:00   local  -- │ Q Message blocked: test 200a
│17:16:06   local  -- │ Q Message blocked: test 200c
│17:16:09   local  -- │ Q Message blocked: test feff


U+200B is not blocked.
TagsNo tags attached.
3rd party modules

Activities

anhtribao

2024-08-27 19:27

reporter   ~0023320

Also I forgot:
[17:15:43] (notice) -liger2.- tkl.SPAMFILTER_MATCH [info] [Spamfilter] thib!thib@localhost matches filter '\x{00a0}': [cmd: PRIVMSG Q: 'u+00a0: A          Z'] [reason: test 00a0] [action: block]
[17:16:00] (notice) -liger2.- tkl.SPAMFILTER_MATCH [info] [Spamfilter] thib!thib@localhost matches filter '\x{200a}': [cmd: PRIVMSG Q: 'u+200a: A          Z'] [reason: test 200a] [action: block]
[17:16:07] (notice) -liger2.- tkl.SPAMFILTER_MATCH [info] [Spamfilter] thib!thib@localhost matches filter '\x{200c}': [cmd: PRIVMSG Q: 'u+200c: A‌‌‌‌‌‌‌‌‌‌Z'] [reason: test 200c] [action: block]
[17:16:09] (notice) -liger2.- tkl.SPAMFILTER_MATCH [info] [Spamfilter] thib!thib@localhost matches filter '\x{feff}': [cmd: PRIVMSG Q: 'u+feff: AZ'] [reason: test feff] [action: block]

PeGaSuS

2024-08-28 01:45

reporter   ~0023321

I can confirm the issue although I don't have any specifc logs to show since I've tried only '\x{200B}', \N{U+200B}', '\x200B' and no other char.

syzop

2024-08-30 09:08

administrator   ~0023322

The reason you cannot regex on this is because StripControlCodes() was modified in 2019 to "eat" ZWSP characters, so it removes them from the input. The reason for that change was presumably because people were evading spamfilters by using that character, so we changed it to remove that character before matching against the regex.

https://github.com/unrealircd/unrealircd/commit/62c7f67f7a86cafaeea87878588710ea183c0c68

syzop

2024-08-30 09:10

administrator   ~0023323

I don't think that is/was a feasible approach though, there are so many characters and variations to interfere with spamfilter matching.. you would keep adding stuff manually forever.

Issue History

Date Modified Username Field Change
2024-08-27 19:26 anhtribao New Issue
2024-08-27 19:27 anhtribao Note Added: 0023320
2024-08-28 01:45 PeGaSuS Note Added: 0023321
2024-08-30 09:08 syzop Note Added: 0023322
2024-08-30 09:10 syzop Note Added: 0023323