View Issue Details

IDProjectCategoryView StatusLast Update
0003719unrealircdpublic2021-03-08 00:51
Reporterpara_1461 Assigned Tosyzop  
PrioritynormalSeverityfeatureReproducibilityN/A
Status resolvedResolutionfixed 
OSN/AOS VersionN/A 
Product Version4.0.0 
Fixed in Version4.0.17 
Summary0003719: Add UTF-8 support
DescriptionI'd like to see UTF-8 support for nicks. It's extremely difficult to find an IRCd that supports this, and the only one I know of can't be found on Google to download.

I believe I'd find it a good feature for when our network gets more, various users.
TagsNo tags attached.
3rd party modules

Relationships

has duplicate 0003723 closed Adding Unicode Support 
has duplicate 0004503 closedsyzop Disallowed umlauts in Username 

Activities

Stealth

2008-08-15 00:54

reporter   ~0015361

IRCds don't normally allow UTF-8 in nicks due to the simple reason that anyone can use alternate UTF characters that look like other characters to spoof the appearance of another user.

For example, my nick (Stealth) can have up to 127 possible fakes with UTF-8. So that means someone can load up to 127 clones with UTF-8 nicks all looking like "Stealth".
Or what if someone with a similar host wants to pretend to be me to get my password?
Or harass another user?
Or carry out some other form of abuse?

Then you have the other issues with upper and lower case characters - the same problems are present there as well. Unreal has a setting to enable other character maps for this purpose (set::allowed-nickchars), and that's even questionable because of the issues mentioned above.

n0kS

2010-10-29 16:26

reporter   ~0016393

I agree with you Stealth, but that's why you can make the user to use only one encoding in his nickname, like: only cyrillic, only arabic or only chinese, and can't mix them. Because if in future is like that, if I start mixing cyrillic with latil letters, as you said, I can get a lot of "fakes".

qdinar

2012-04-22 14:20

reporter   ~0016984

this is very useful because people will not need to press key combination of changing heyboard layot, to mention other users.

syzop

2013-01-09 11:10

administrator   ~0017339

Last edited: 2013-01-09 20:47

There's a document called 'Unicode Security Considerations' which deals with exactly this: http://www.unicode.org/reports/tr36/
I lost my other link but there are also functions that can see which characters are identical or very similar.

--> EDIT:
NFKC, comnbined with 'case folding' to make it case insensitive.

If I understand correctly that should solve most if not all of the security concerns (look alike characters).

Of course, there are plenty of other things that still have to be solved/done before you have UTF8 support...

syzop

2015-12-26 10:31

administrator   ~0018945

For some next series (not 4.0.x) I think this would be a nice release goal.

blank

2015-12-29 14:23

reporter   ~0018993

YES.

blank

2016-03-20 14:01

reporter   ~0019143

@syzop if a network was willing to sponsor this (in € terms), would it speed up getting this added?

syzop

2016-03-27 11:01

administrator   ~0019147

Last edited: 2016-03-27 11:02

The next few months I'll mostly be working on things other than UnrealIRCd I'm afraid (so just bug fixes, minor things). I usually do that after such a lengthy period of UnrealIRCd development (a full year on U4 in this case).
After that I'm seriously considering looking into this, since I think this would be an important feature.

syzop

2017-11-19 17:28

administrator   ~0019973

Depends on https://github.com/ircv3/ircv3-specifications/pull/272
Once spec is agreed on (or direction is clear) we also need some library or drop-in code that IRC servers, services and clients can use to handle this.

k4be

2017-11-25 15:44

developer   ~0019977

"I agree with you Stealth, but that's why you can make the user to use only one encoding in his nickname, like: only cyrillic, only arabic or only chinese, and can't mix them. Because if in future is like that, if I start mixing cyrillic with latil letters, as you said, I can get a lot of "fakes"."

Possibly simplest solution: instead of allowing every possible UTF-8 character, just specify a fixed character list in a config file. Would differ from old allowed-nickchars in that, allowed characters would be longer than one byte. This will be sufficient for (probably) all networks dominated with a single language.

syzop

2017-11-25 16:23

administrator   ~0019978

That is true. The thing is that https://github.com/ircv3/ircv3-specifications/pull/272 also deals with proper CASEMAPPING. So 'hell<o with accent>' is considered the same as 'HELL<O with accent>', as you would expect.
So, ideally you would want to fix both these things at the same time.
And, at the same time, services adding support for the same.

But, yeah, the alternative is to just add the ranges like we do now. And ignore CASEMAPPING for now.
That alternative is viable if the previously mentioned github pull request takes too long (and it seems stuck right now).

Anyway, more on-topic:
Of course, if we permit - say - UTF8 hebrew then we should only permit the UTF8 ranges and not non-UTF8 hebrew at the same time, as that would case the same display and security issues as previously mentioned.

syzop

2017-11-25 21:18

administrator   ~0019979

Added, without the casemapping (just like existing set::allowed-nickchars):
https://github.com/unrealircd/unrealircd/commit/e3b91f8b94aa775ad2536576a8b5c324754b99ff

* Added UTF8 support in set::allowed-nickchars
  See https://www.unrealircd.org/docs/Nick_Character_Sets
  Example: set { allowed-nickchars { latin-utf8; }; };
  Important remarks:
  * All your servers must be on UnrealIRCd 4.0.17 (or later)
  * Most(?) services do not support this, so users using UTF8 nicknames
    won't be able to register at NickServ.
  * In set::allowed-nickchars you must either choose an utf8 language
    or a non-utf8 character set. You cannot combine the two.
  * You also cannot combine multiple scripts/alphabets, such as:
    latin, greek, cyrillic and hebrew. You must choose one.
  * If you are already using set::allowed-nickchars on your network
    (eg: 'latin1') then be careful when migrating (to eg: 'latin-utf8'):
    * Your clients may still assume non-UTF8
    * If users registered nicks with accents or other special characters
      at NickServ then they may not be able to access their account
      after the migration to UTF8.

[!] Work in progress [!]

mcken

2018-01-11 17:20

reporter   ~0020012

It was a long awaited feature and we are really grateful for having it now.

Would it be possible to add an optional full utf-8 support for nicks, where all non-text utf-8 characters such as ??????( ? )? could be used as well? Generally speaking, these UTF-8 shouldn't break IRC core functionality in general when used in nicks.

mcken

2018-01-11 17:24

reporter   ~0020013

Sorry for double posting, but editing is not possible. My UTF-8 characters at the previous post are not rendered correctly due to database charset configuration or something similar. The characters I mentioned can be viewed here as an example: http://upli.st/l/list-of-all-ascii-emoticons

syzop

2018-07-14 16:59

administrator   ~0020209

Just an update: I'm not working on this for 4.0.19. We'll have to see after that but I'm not aware of services and ircv3 drafts and such catching up.. pitty.. hoped I would have started something.
Due to different priorities in life and time constraints I have to pick my release targets and this one won't be one of them for next release.

As for the last post from mcken: I'm personally kinda reluctant to add such things. As you can see from previous work we try to pick characters/symbols that are "language" so to say, and not symbols like in math or smileys/emoticons and so on.

syzop

2020-09-27 20:07

administrator   ~0021773

We added UTF8 nick characters in 4.0.17. Similarly, we have the option to only allow valid utf8 in channel names (it is even the default) since 5.0.0.
CASEMAPPING is an entirely different matter though with still plenty of problems and unimplemented: https://bugs.unrealircd.org/view.php?id=2882

Issue History

Date Modified Username Field Change
2008-08-14 14:35 para_1461 New Issue
2008-08-15 00:54 Stealth Note Added: 0015361
2008-08-15 00:54 Stealth Status new => feedback
2008-08-28 01:12 Stealth Relationship added has duplicate 0003723
2010-10-29 16:26 n0kS Note Added: 0016393
2012-04-22 14:20 qdinar Note Added: 0016984
2013-01-09 11:10 syzop Note Added: 0017339
2013-01-09 20:45 syzop Note Edited: 0017339
2013-01-09 20:47 syzop Note Edited: 0017339
2015-12-26 10:29 syzop Relationship added has duplicate 0004503
2015-12-26 10:31 syzop Note Added: 0018945
2015-12-26 10:31 syzop Assigned To => syzop
2015-12-26 10:31 syzop Status feedback => acknowledged
2015-12-26 10:33 syzop Product Version 3.3-alpha0 => 4.0.0
2015-12-26 10:33 syzop Summary UTF-8 charset in UnrealIRCd 3.3 => Add UTF-8 support
2015-12-26 10:33 syzop Description Updated
2015-12-29 14:23 blank Note Added: 0018993
2016-03-20 14:01 blank Note Added: 0019143
2016-03-27 11:01 syzop Note Added: 0019147
2016-03-27 11:02 syzop Note Edited: 0019147
2017-11-19 17:28 syzop Note Added: 0019973
2017-11-25 15:45 k4be Note Added: 0019977
2017-11-25 16:23 syzop Note Added: 0019978
2017-11-25 21:18 syzop Note Added: 0019979
2018-01-11 17:20 mcken Note Added: 0020012
2018-01-11 17:24 mcken Note Added: 0020013
2018-07-14 16:59 syzop Note Added: 0020209
2020-09-27 20:07 syzop Status acknowledged => resolved
2020-09-27 20:07 syzop Resolution open => fixed
2020-09-27 20:07 syzop Fixed in Version => 4.0.17
2020-09-27 20:07 syzop Note Added: 0021773