View Issue Details

IDProjectCategoryView StatusLast Update
0006053unrealircdpublic2023-09-09 11:28
Reporterdjsxx1984 Assigned Tosyzop  
PriorityurgentSeveritymajorReproducibilityalways
Status closedResolutionno change required 
Product Version6.0.1 
Summary0006053: Creating channels with non-breaking-space are allowed in U5.0.5 -> U6.0.1
DescriptionAccording to https://datatracker.ietf.org/doc/html/rfc1459 spaces in channels are not allowed.
But when typing //join //join $+(#alle,$chr(160),rokers) in mIRC or /join #alle(ALT+160)rokers, we are able to join #alle rokers

This should not be possible.

Channels names are strings (beginning with a '&' or '#' character) of
length up to 200 characters. Apart from the the requirement that the
first character being either '&' or '#'; the only restriction on a
channel name is that it may not contain any spaces (' '), a control G
(^G or ASCII 7), or a comma (',' which is used as a list item
separator by the protocol).

Steps To Reproduce//join #test$chr(160)test <-- will join #test$chr(160)test
//join $+(#alle,$chr(160),rokers) <-- will join #alle rokers

[09-01-2022][20:50:14] * |join| you are now talking in #alle rokers
[09-01-2022][20:50:14] * irc3.Chattersworld.nl sets mode: +nt

Registering the channel is not possible cause Anope doesn't accept the space ALT/$chr

https://tinyurl.com/yxo3vtsj

See above screenshot for the channel with space
TagsNo tags attached.
Attached Files
afbeelding.png (3,230 bytes)
3rd party modules

Activities

djsxx1984

2022-01-09 22:24

reporter   ~0022364

After a small talk on #unreal character 160 is a non-breaking space and not a whitespace
Maybe considering to remove the nbsp character at the end of channelnames.

example:
/join #test -> joins #test
/join #test&nbsp -> joins #test
/join #Test&nbsptest -> joins #test test

syzop

2022-01-10 08:15

administrator   ~0022365

Space is 0x20 (ASCII 32), as mentioned repeatedly in RFC1459. It is not allowed because spaces are used in the protocol as separators. If a space would exists in a channel name then things go bad.. really bad because you don't know anymore if the stuff after the space belongs to a channel name or to the rest of the protocol (eg a nick name or a timestamp or whatever). The space character (0x20) is blocked in UnrealIRCd in channel names, always, since.. forever.

As you figured out by now, you are talking about "non-breaking-space", byte 0xA0 (byte 160), which can create a visual issue.
We have blocked this for many many years, but stopped blocking it after https://bugs.unrealircd.org/view.php?id=4538 via this commit https://github.com/unrealircd/unrealircd/commit/a4e076c08c9aee0853466e53174b138d3092d675
UnrealIRCd is only partially UTF8 aware so it currently blocks things at the byte level. 0xA0 is a valid byte that is seen for other characters in UTF8 so blocking byte 0xA0 causes valid channel names to be cut off or rejected, as mentioned in that bug report.

So your point could very well be that it creates a visual issue, that it is hard (sometimes even impossible) to distinguish one channel name from another. And this is all true. But it isn't limited to 0xA0, it is a big issue with UTF8 in general. You can create many lookalike characters for channel names, for example by replacing a latin character with a cyrillic character. Nobody will notice the difference.
If you are concerned with this, then I suggest you have a look at a feature that was introduced in 5.0.0. It is called set::allowed-channelchars https://www.unrealircd.org/docs/Set_block#set::allowed-channelchars and the safest setting for this is "ascii". This is more or less what freenode used (until a year ago) and a few other big networks. The downside is that you restrict all the channel names to... ascii.

syzop

2022-01-10 11:48

administrator   ~0022366

Last edited: 2022-01-10 12:26

From what I understand, this happened:

1) User saw a channel or list of channels #abc #def #ghi or a channel mentioned within a conversation, and they copy-pasted this from somewhere to the lounge irc client. Either the source (the copy action) or the lounge (the paste action) turned the spaces into non breaking space.
2) User then wanted to add the entry via /NS AJOIN, but they didn't actually add "#abc". Instead they overlooked the non breaking space and added: #abc + nonbreaking space (0xA0). Just like people sometimes paste a space after their password, but in this case anope accepts it since it is a non-breaking-space (i can't really blame them).
3) Now next auto join they end up joining that channel and were wondering where everyone was

While I totally understand the confusion, basically anope and Unreal are just doing what the user told them. The fact that the non breaking space cannot easily be seen visually is not taken into account by either.

Let me first give you my general POV:

In my point of view you are in a loosing battle when you start to handle non breaking space. As I explained earlier, with UTF8 it is really easy to use confusable channels deliberately. Like #abc you could replace the 'a' there with a cyrillic 'a' and you can see no difference, even though it is a completely different channel technically. And it's not just that, the possibilities are endless with UTF8 it is quite a nightmare.
Even for just space, there are 18 other UTF8 characters that are shown as whitespace. You can find them here: https://util.unicode.org/UnicodeJsps/confusables.jsp?a=+&r=None
There's currently no solution to properly handle UTF8 lookalike problems. I've been in an IRCv3 discussion about this as well but there's no progress and some claim it is even impossible to solve. Another problem is that both services and the ircd would need to have the "same rules".
The best thing you can do if you are concerned by this is, what i said earlier, use set::allowed-channelchars https://www.unrealircd.org/docs/Set_block#set::allowed-channelchars and set it to "ascii".

Now your POV (ok, Jellis, but I understand he is on the same network):
This wasn't really deliberate, it was just an ordinary user (not an IRC-die-hard) that copy-pasted something and is now in a situation where UnrealIRCd joins them to a channel where they are missing friends (confusion all along) and in anope they cannot delete the AJOIN entry either and even an ircop had some difficulties with this.
So, that is all understandable. And... sure.. it is possible to do something in particular about a single character like this, if this particular character gives you lots of problems.

Now, merging both points of view again:
I just want to be absolutely clear that there is no way you can solve the lookalike problem or not have other similar cases. So IF you or me do something, it will be limited to this character only (and for any other characters you choose).

In UnrealIRCd 5 and 6 you currently have two options:

1) set { allowed-channelchars ascii; }
2) you can add a deny channel block to block this character: deny channel { mask *nonbreakingspace*; reason "something"; }
And, obviously, instead of nonbreakingspace you paste the actual non-breaking-space character there, in UTF8 it ix 0xCA 0xA0.
Grab the block here: https://pastebin.com/raw/mjAUcydH

Note that this does not help you with the anope AJOIN list issue that you had. It would still add the entry and you would still have problems removing it (although now you probably know how). In your example, the only thing it prevents is the joining of the channel. So: you will still get users complaining about "duplicate entries in their AJOIN list" and "what is this weird error message in my status" regarding blocking the join.

Jellis

2022-01-10 17:18

reporter   ~0022371

I was able to narrow the copy/paste issue the user did to pasting into The Lounge(client), I could not (non deliberate) do it with other clients. I will issue a bug report to The Lounge since I'm sure adding a nbsp after the first line when 2 or more lines are pasted (the first being a command) is intended to be - and caused this issue in the first place.

On our network (we are still discussing) probably the "set { allowed-channelchars ascii; }" will be used for now.

Thanks for all feedback in the IRC channels on Unreal Support and hope we diden't cause to much confusion.

djsxx1984

2022-01-10 20:03

reporter   ~0022372

Thank you for your clarity Syzop, indeed what my problem is, is creating channel lookalikes.
UTF-8 makes it allowing to create lookalike channels, it just looks weird to our users (non IRC diehards).

For this moment I have set: set { allowed-channelchars ascii; }

djsxx1984

2022-01-10 20:06

reporter   ~0022373

Maybe just an idea for disallowing the nbsp utf-8 at the end of a channelname, where KiwiIRC and TheLounge clients adds the &nbsp while copy/pasting multiple lines onto IRC

syzop

2023-09-09 11:28

administrator   ~0023027

I think there is not much else possible than to close this.

Issue History

Date Modified Username Field Change
2022-01-09 21:41 djsxx1984 New Issue
2022-01-09 21:41 djsxx1984 File Added: afbeelding.png
2022-01-09 22:24 djsxx1984 Note Added: 0022364
2022-01-10 08:15 syzop Note Added: 0022365
2022-01-10 11:48 syzop Note Added: 0022366
2022-01-10 12:26 syzop Note Edited: 0022366
2022-01-10 17:18 Jellis Note Added: 0022371
2022-01-10 20:03 djsxx1984 Note Added: 0022372
2022-01-10 20:06 djsxx1984 Note Added: 0022373
2023-09-09 11:28 syzop Summary Creating channels with spaces are allowed in U5.0.5 -> U6.0.1 => Creating channels with non-breaking-space are allowed in U5.0.5 -> U6.0.1
2023-09-09 11:28 syzop Assigned To => syzop
2023-09-09 11:28 syzop Status new => closed
2023-09-09 11:28 syzop Resolution open => no change required
2023-09-09 11:28 syzop Note Added: 0023027