View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0003719 | unreal | ircd | public | 2008-08-14 14:35 | 2021-03-08 00:51 |
Reporter | para_1461 | Assigned To | syzop | ||
Priority | normal | Severity | feature | Reproducibility | N/A |
Status | resolved | Resolution | fixed | ||
OS | N/A | OS Version | N/A | ||
Product Version | 4.0.0 | ||||
Fixed in Version | 4.0.17 | ||||
Summary | 0003719: Add UTF-8 support | ||||
Description | I'd like to see UTF-8 support for nicks. It's extremely difficult to find an IRCd that supports this, and the only one I know of can't be found on Google to download. I believe I'd find it a good feature for when our network gets more, various users. | ||||
Tags | No tags attached. | ||||
3rd party modules | |||||
|
IRCds don't normally allow UTF-8 in nicks due to the simple reason that anyone can use alternate UTF characters that look like other characters to spoof the appearance of another user. For example, my nick (Stealth) can have up to 127 possible fakes with UTF-8. So that means someone can load up to 127 clones with UTF-8 nicks all looking like "Stealth". Or what if someone with a similar host wants to pretend to be me to get my password? Or harass another user? Or carry out some other form of abuse? Then you have the other issues with upper and lower case characters - the same problems are present there as well. Unreal has a setting to enable other character maps for this purpose (set::allowed-nickchars), and that's even questionable because of the issues mentioned above. |
|
I agree with you Stealth, but that's why you can make the user to use only one encoding in his nickname, like: only cyrillic, only arabic or only chinese, and can't mix them. Because if in future is like that, if I start mixing cyrillic with latil letters, as you said, I can get a lot of "fakes". |
|
this is very useful because people will not need to press key combination of changing heyboard layot, to mention other users. |
|
There's a document called 'Unicode Security Considerations' which deals with exactly this: http://www.unicode.org/reports/tr36/ I lost my other link but there are also functions that can see which characters are identical or very similar. --> EDIT: NFKC, comnbined with 'case folding' to make it case insensitive. If I understand correctly that should solve most if not all of the security concerns (look alike characters). Of course, there are plenty of other things that still have to be solved/done before you have UTF8 support... |
|
For some next series (not 4.0.x) I think this would be a nice release goal. |
|
YES. |
|
@syzop if a network was willing to sponsor this (in € terms), would it speed up getting this added? |
|
The next few months I'll mostly be working on things other than UnrealIRCd I'm afraid (so just bug fixes, minor things). I usually do that after such a lengthy period of UnrealIRCd development (a full year on U4 in this case). After that I'm seriously considering looking into this, since I think this would be an important feature. |
|
Depends on https://github.com/ircv3/ircv3-specifications/pull/272 Once spec is agreed on (or direction is clear) we also need some library or drop-in code that IRC servers, services and clients can use to handle this. |
|
"I agree with you Stealth, but that's why you can make the user to use only one encoding in his nickname, like: only cyrillic, only arabic or only chinese, and can't mix them. Because if in future is like that, if I start mixing cyrillic with latil letters, as you said, I can get a lot of "fakes"." Possibly simplest solution: instead of allowing every possible UTF-8 character, just specify a fixed character list in a config file. Would differ from old allowed-nickchars in that, allowed characters would be longer than one byte. This will be sufficient for (probably) all networks dominated with a single language. |
|
That is true. The thing is that https://github.com/ircv3/ircv3-specifications/pull/272 also deals with proper CASEMAPPING. So 'hell<o with accent>' is considered the same as 'HELL<O with accent>', as you would expect. So, ideally you would want to fix both these things at the same time. And, at the same time, services adding support for the same. But, yeah, the alternative is to just add the ranges like we do now. And ignore CASEMAPPING for now. That alternative is viable if the previously mentioned github pull request takes too long (and it seems stuck right now). Anyway, more on-topic: Of course, if we permit - say - UTF8 hebrew then we should only permit the UTF8 ranges and not non-UTF8 hebrew at the same time, as that would case the same display and security issues as previously mentioned. |
|
Added, without the casemapping (just like existing set::allowed-nickchars): https://github.com/unrealircd/unrealircd/commit/e3b91f8b94aa775ad2536576a8b5c324754b99ff * Added UTF8 support in set::allowed-nickchars See https://www.unrealircd.org/docs/Nick_Character_Sets Example: set { allowed-nickchars { latin-utf8; }; }; Important remarks: * All your servers must be on UnrealIRCd 4.0.17 (or later) * Most(?) services do not support this, so users using UTF8 nicknames won't be able to register at NickServ. * In set::allowed-nickchars you must either choose an utf8 language or a non-utf8 character set. You cannot combine the two. * You also cannot combine multiple scripts/alphabets, such as: latin, greek, cyrillic and hebrew. You must choose one. * If you are already using set::allowed-nickchars on your network (eg: 'latin1') then be careful when migrating (to eg: 'latin-utf8'): * Your clients may still assume non-UTF8 * If users registered nicks with accents or other special characters at NickServ then they may not be able to access their account after the migration to UTF8. [!] Work in progress [!] |
|
It was a long awaited feature and we are really grateful for having it now. Would it be possible to add an optional full utf-8 support for nicks, where all non-text utf-8 characters such as ??????( ? )? could be used as well? Generally speaking, these UTF-8 shouldn't break IRC core functionality in general when used in nicks. |
|
Sorry for double posting, but editing is not possible. My UTF-8 characters at the previous post are not rendered correctly due to database charset configuration or something similar. The characters I mentioned can be viewed here as an example: http://upli.st/l/list-of-all-ascii-emoticons |
|
Just an update: I'm not working on this for 4.0.19. We'll have to see after that but I'm not aware of services and ircv3 drafts and such catching up.. pitty.. hoped I would have started something. Due to different priorities in life and time constraints I have to pick my release targets and this one won't be one of them for next release. As for the last post from mcken: I'm personally kinda reluctant to add such things. As you can see from previous work we try to pick characters/symbols that are "language" so to say, and not symbols like in math or smileys/emoticons and so on. |
|
We added UTF8 nick characters in 4.0.17. Similarly, we have the option to only allow valid utf8 in channel names (it is even the default) since 5.0.0. CASEMAPPING is an entirely different matter though with still plenty of problems and unimplemented: https://bugs.unrealircd.org/view.php?id=2882 |
Date Modified | Username | Field | Change |
---|---|---|---|
2008-08-14 14:35 | para_1461 | New Issue | |
2008-08-15 00:54 | Stealth | Note Added: 0015361 | |
2008-08-15 00:54 | Stealth | Status | new => feedback |
2008-08-28 01:12 | Stealth | Relationship added | has duplicate 0003723 |
2010-10-29 16:26 | n0kS | Note Added: 0016393 | |
2012-04-22 14:20 | qdinar | Note Added: 0016984 | |
2013-01-09 11:10 | syzop | Note Added: 0017339 | |
2013-01-09 20:45 | syzop | Note Edited: 0017339 | |
2013-01-09 20:47 | syzop | Note Edited: 0017339 | |
2015-12-26 10:29 | syzop | Relationship added | has duplicate 0004503 |
2015-12-26 10:31 | syzop | Note Added: 0018945 | |
2015-12-26 10:31 | syzop | Assigned To | => syzop |
2015-12-26 10:31 | syzop | Status | feedback => acknowledged |
2015-12-26 10:33 | syzop | Product Version | 3.3-alpha0 => 4.0.0 |
2015-12-26 10:33 | syzop | Summary | UTF-8 charset in UnrealIRCd 3.3 => Add UTF-8 support |
2015-12-26 10:33 | syzop | Description Updated | |
2015-12-29 14:23 | blank | Note Added: 0018993 | |
2016-03-20 14:01 | blank | Note Added: 0019143 | |
2016-03-27 11:01 | syzop | Note Added: 0019147 | |
2016-03-27 11:02 | syzop | Note Edited: 0019147 | |
2017-11-19 17:28 | syzop | Note Added: 0019973 | |
2017-11-25 15:45 | k4be | Note Added: 0019977 | |
2017-11-25 16:23 | syzop | Note Added: 0019978 | |
2017-11-25 21:18 | syzop | Note Added: 0019979 | |
2018-01-11 17:20 | mcken | Note Added: 0020012 | |
2018-01-11 17:24 | mcken | Note Added: 0020013 | |
2018-07-14 16:59 | syzop | Note Added: 0020209 | |
2020-09-27 20:07 | syzop | Status | acknowledged => resolved |
2020-09-27 20:07 | syzop | Resolution | open => fixed |
2020-09-27 20:07 | syzop | Fixed in Version | => 4.0.17 |
2020-09-27 20:07 | syzop | Note Added: 0021773 |