2017-12-14 21:36 CET

View Issue Details Jump to Notes ]
IDProjectCategoryView StatusLast Update
0003719unrealircdpublic2017-11-25 21:18
Reporterpara_1461 
Assigned Tosyzop 
PrioritynormalSeverityfeatureReproducibilityN/A
StatusacknowledgedResolutionopen 
PlatformOSN/AOS VersionN/A
Product Version4.0.0 
Target VersionFixed in Version 
Summary0003719: Add UTF-8 support
DescriptionI'd like to see UTF-8 support for nicks. It's extremely difficult to find an IRCd that supports this, and the only one I know of can't be found on Google to download.

I believe I'd find it a good feature for when our network gets more, various users.
TagsNo tags attached.
3rd party modules
Attached Files

-Relationships
has duplicate 0003723closed Adding Unicode Support 
has duplicate 0004503closedsyzop Disallowed umlauts in Username 
+Relationships

-Notes

~0015361

Stealth (reporter)

IRCds don't normally allow UTF-8 in nicks due to the simple reason that anyone can use alternate UTF characters that look like other characters to spoof the appearance of another user.

For example, my nick (Stealth) can have up to 127 possible fakes with UTF-8. So that means someone can load up to 127 clones with UTF-8 nicks all looking like "Stealth".
Or what if someone with a similar host wants to pretend to be me to get my password?
Or harass another user?
Or carry out some other form of abuse?

Then you have the other issues with upper and lower case characters - the same problems are present there as well. Unreal has a setting to enable other character maps for this purpose (set::allowed-nickchars), and that's even questionable because of the issues mentioned above.

~0016393

n0kS (reporter)

I agree with you Stealth, but that's why you can make the user to use only one encoding in his nickname, like: only cyrillic, only arabic or only chinese, and can't mix them. Because if in future is like that, if I start mixing cyrillic with latil letters, as you said, I can get a lot of "fakes".

~0016984

qdinar (reporter)

this is very useful because people will not need to press key combination of changing heyboard layot, to mention other users.

~0017339

syzop (administrator)

Last edited: 2013-01-09 20:47

View 3 revisions

There's a document called 'Unicode Security Considerations' which deals with exactly this: http://www.unicode.org/reports/tr36/
I lost my other link but there are also functions that can see which characters are identical or very similar.

--> EDIT:
NFKC, comnbined with 'case folding' to make it case insensitive.

If I understand correctly that should solve most if not all of the security concerns (look alike characters).

Of course, there are plenty of other things that still have to be solved/done before you have UTF8 support...

~0018945

syzop (administrator)

For some next series (not 4.0.x) I think this would be a nice release goal.

~0018993

blank (reporter)

YES.

~0019143

blank (reporter)

@syzop if a network was willing to sponsor this (in € terms), would it speed up getting this added?

~0019147

syzop (administrator)

Last edited: 2016-03-27 11:02

View 2 revisions

The next few months I'll mostly be working on things other than UnrealIRCd I'm afraid (so just bug fixes, minor things). I usually do that after such a lengthy period of UnrealIRCd development (a full year on U4 in this case).
After that I'm seriously considering looking into this, since I think this would be an important feature.

~0019973

syzop (administrator)

Depends on https://github.com/ircv3/ircv3-specifications/pull/272
Once spec is agreed on (or direction is clear) we also need some library or drop-in code that IRC servers, services and clients can use to handle this.

~0019977

k4be (reporter)

"I agree with you Stealth, but that's why you can make the user to use only one encoding in his nickname, like: only cyrillic, only arabic or only chinese, and can't mix them. Because if in future is like that, if I start mixing cyrillic with latil letters, as you said, I can get a lot of "fakes"."

Possibly simplest solution: instead of allowing every possible UTF-8 character, just specify a fixed character list in a config file. Would differ from old allowed-nickchars in that, allowed characters would be longer than one byte. This will be sufficient for (probably) all networks dominated with a single language.

~0019978

syzop (administrator)

That is true. The thing is that https://github.com/ircv3/ircv3-specifications/pull/272 also deals with proper CASEMAPPING. So 'hell<o with accent>' is considered the same as 'HELL<O with accent>', as you would expect.
So, ideally you would want to fix both these things at the same time.
And, at the same time, services adding support for the same.

But, yeah, the alternative is to just add the ranges like we do now. And ignore CASEMAPPING for now.
That alternative is viable if the previously mentioned github pull request takes too long (and it seems stuck right now).

Anyway, more on-topic:
Of course, if we permit - say - UTF8 hebrew then we should only permit the UTF8 ranges and not non-UTF8 hebrew at the same time, as that would case the same display and security issues as previously mentioned.

~0019979

syzop (administrator)

Added, without the casemapping (just like existing set::allowed-nickchars):
https://github.com/unrealircd/unrealircd/commit/e3b91f8b94aa775ad2536576a8b5c324754b99ff

* Added UTF8 support in set::allowed-nickchars
  See https://www.unrealircd.org/docs/Nick_Character_Sets
  Example: set { allowed-nickchars { latin-utf8; }; };
  Important remarks:
  * All your servers must be on UnrealIRCd 4.0.17 (or later)
  * Most(?) services do not support this, so users using UTF8 nicknames
    won't be able to register at NickServ.
  * In set::allowed-nickchars you must either choose an utf8 language
    or a non-utf8 character set. You cannot combine the two.
  * You also cannot combine multiple scripts/alphabets, such as:
    latin, greek, cyrillic and hebrew. You must choose one.
  * If you are already using set::allowed-nickchars on your network
    (eg: 'latin1') then be careful when migrating (to eg: 'latin-utf8'):
    * Your clients may still assume non-UTF8
    * If users registered nicks with accents or other special characters
      at NickServ then they may not be able to access their account
      after the migration to UTF8.

[!] Work in progress [!]
+Notes

-Issue History
Date Modified Username Field Change
2008-08-14 14:35 para_1461 New Issue
2008-08-15 00:54 Stealth Note Added: 0015361
2008-08-15 00:54 Stealth Status new => feedback
2008-08-28 01:12 Stealth Relationship added has duplicate 0003723
2010-10-29 16:26 n0kS Note Added: 0016393
2012-04-22 14:20 qdinar Note Added: 0016984
2013-01-09 11:10 syzop Note Added: 0017339
2013-01-09 20:45 syzop Note Edited: 0017339 View Revisions
2013-01-09 20:47 syzop Note Edited: 0017339 View Revisions
2015-12-26 10:29 syzop Relationship added has duplicate 0004503
2015-12-26 10:31 syzop Note Added: 0018945
2015-12-26 10:31 syzop Assigned To => syzop
2015-12-26 10:31 syzop Status feedback => acknowledged
2015-12-26 10:33 syzop Product Version 3.3-alpha0 => 4.0.0
2015-12-26 10:33 syzop Summary UTF-8 charset in UnrealIRCd 3.3 => Add UTF-8 support
2015-12-26 10:33 syzop Description Updated View Revisions
2015-12-29 14:23 blank Note Added: 0018993
2016-03-20 14:01 blank Note Added: 0019143
2016-03-27 11:01 syzop Note Added: 0019147
2016-03-27 11:02 syzop Note Edited: 0019147 View Revisions
2017-11-19 17:28 syzop Note Added: 0019973
2017-11-25 15:45 k4be Note Added: 0019977
2017-11-25 16:23 syzop Note Added: 0019978
2017-11-25 21:18 syzop Note Added: 0019979
+Issue History