View Issue Details

IDProjectCategoryView StatusLast Update
0002070unrealircdpublic2005-02-19 16:52
ReporterSlimpAssigned Tosyzop 
PrioritynormalSeveritytextReproducibilityalways
Status resolvedResolutionfixed 
PlatformALLOSALLOS VersionALL
Product Version 
Target VersionFixed in Version3.2.3 
Summary0002070: Localized nick support
DescriptionSYZOPS TEXT:
We should add something like set::allowed-nick where you can specify which nick charsets you want to support (eg: set { allowed-nick { chinese; japanese; russian; };).
This would then use a character map (probably char_atribs from src/match.c, with a new flag). And for wide-chars we could use a list with ranges (like japanese is 0xa4a1-0xa4f3 & 0xa5a1-0xa5f7, chinese is: 0xb0a1-0xf7fe & 0x8140-0xa0fe & probably 0xaa40-0xfea0.
Modules can also change this charmap, @Load and @HOOKTYPE_REHASH_COMPLETE.

ORIGINAL TEXT:
May you add Russian nick support in Unreal? Now, when I change nick to Russian nick, it always shows: Erroneous Nickname: Illegal characters, and don't change it
TagsNo tags attached.
3rd party modules

Relationships

related to 0000052 closed LOCALE_NICK support 

Activities

syzop

2004-09-13 15:12

administrator   ~0007633

->codemastr: I was kinda thinking of a callback system for this, just like we did with cloaking... (Well, using the current charset if no such module/callback is used.. I already coded the callback system to deal with that possibility :p).
Obviously, this would be problematic if changed at runtime, but then again.. I think this is the admins responsability.. Also in some cases, eg with 1 server and switching from standard to russian it would be no problem so forcing a restart is then kinda nasty :p.

codemastr

2004-09-13 18:25

reporter   ~0007643

Yeah I considered a CALLBACK_NICKVALID or something like that. Actually I considered something more general, CALLBACK_VALIDATE or something like that. It would have something like:

callback(char *string, int type)

type would indicate whether it is a nick, ident, or host.

Btw this reminds me of something, I'd really like to try and stop using "int" as a type for type params (if that made sense :P). If we can, I think using enum whereever possible is better. It gives us range checking, and makes debugging much easier.

syzop

2004-10-06 20:57

administrator   ~0007888

Will try to do this within a few days / a week. If I fail however, then I'll postpone it to after 3.2.2 :p.

syzop

2004-10-10 03:10

administrator   ~0007939

Hm.. I'll probably drop this idea for 3.2.2.
It seems, since we are tending towards 'locale' stuff (possibly even with wide chars etc), that it's a bit more complex than the original idea.
It at least requires a lot more testing ;).

syzop

2005-01-04 20:20

administrator   ~0008699

Last edited: 2005-01-04 20:21

I'm also dropping this for 3.2.3... [also made this feature request public again]
Perhaps we should consider doing the original method instead of locale() stuff to make it less complex (so "as a hack"), and do it fully "as it should" in 3.3*.. I dunnow.

If you have russian nick charset code (as simple as the chinese nick stuff), feel free to mail it to me (syzop@unrealircd.com) and then it might be added.

syzop

2005-02-13 20:21

administrator   ~0009118

I've started on implementing the manual "allow list" stuff (a linked list for multibyte ranges, and tagging in a map [char_atribs] for standard ascii/8-bit characters). So not using locale() or anything, but hardcoded. I already got the basic system working :).

Actually I only made it for nicks now, but I'll change it a bit so it works for ident & hosts (seperately) as well :). Since I think you might well want to allow, say, russian nicks, but not in ident or hosts (well, we already agreed on that earlier).

I was thinking of something like this:
set {
 allowed-characters
 {
  nick { <languages>; };
  ident { <languages>; };
  host { <languages>; };
 };
};

syzop

2005-02-13 20:43

administrator   ~0009119

Last edited: 2005-02-13 20:44

Oh that brings up a design problem btw...
Because the stuff is read from the conf, during config run (or test), the "is this character allowed in <nick|ident|host>" is not fully initialized of course (since it's reading it from the conf)... hence any such checks on entries in during testing, when checking for example a vhost::host could/would fail (well, for chinese hosts for example).
For nick this is usually not a real problem, but for idents/hosts it is, because it specifies (for example) what ident@host you get on oper, vhost, etc :P.

Well, I don't see any way to solve this without redesigning the way we read confs (adding an additional step), even though that is not THAT hard.. it's not exactly fun either :P.

WAaaaaaaaa :P

codemastr

2005-02-13 21:47

reporter   ~0009120

[quote]Since I think you might well want to allow, say, russian nicks, but not in ident or hosts (well, we already agreed on that earlier).[/quote]

Those characters should NEVER be allowed in hosts. It is against the DNS protocol! If someone is sending those characters to use, the DNS server is broken and should be ignored. The only characters that are allowed in a DNS reply are A-Z a-z 0-9 . and -. NOTHING else should ever be allowed.

syzop

2005-02-13 22:03

administrator   ~0009121

Hm ok, then I seriously consider just doing nick only. IMO few people will get a heartattack if they cannot get a chinese ident, or ok.. or at least I'm not in the mood to do the crazy stuff I described earlier :P. Besides, else I might be postponing it again which is in nobody's interest anyway :p.
Having the nick stuff is nice! (although, it's of course still a 'hack' :P)

Stealth

2005-02-13 23:40

reporter   ~0009122

Having an allow for channel names would be nice too :)

syzop

2005-02-14 10:39

administrator   ~0009127

What's interesting is that currently anything is allowed in channels already, except ascii <=32, comma and ascii 160 (no breaking space) :p. The code has an interesting comment however:
for (; *ch; ch++)
    /* Don't allow any control chars, the space, the comma,
     * or the "non-breaking space" in channel names.
     * Might later be changed to a system where the list of
     * allowed/non-allowed chars for channels was a define
     * or some such.
     * --Wizzu
     */
    if (*ch < 33 || *ch == ',' || *ch == 160)
    {
        *ch = '\0';
        return;
    }

ah well... ;)
[it's like that with most ircds btw]

codemastr

2005-02-14 18:23

reporter   ~0009139

Imho, if people want international characters in ident, then they'd need something like IDN. IDN is how DNS implements those characters (my guess is this is why you suggested allowing them in hostnames?). Basically, it works by "emulating" it. For example, say I have www.tëst.com. Though the client displays "www.tëst.com", the actual domain name using IDN is www.xn--tst-jma.com basically, it uses a markup of regular characters to determine which international characters to display. Now whether it is the job of the IRCd to parse and handle this, or whether it is the client is still questionable. Libidn would do all the work for us, but I don't know if it is our job. IMHO though, if people want it in ident, they should use a similar system.

syzop

2005-02-14 21:00

administrator   ~0009140

Hm good point... I don't know it either.. :p.
[oh and I simply thought of ident/host because of your 2nd bugnote here ('type would indicate whether it is a nick, ident, or host']

I'm doing my best at the nickstuff now :p. Btw when looking at locale (not that I use that) I noticed that it allowed way more characters than I think are needed. Like for dutch it allowed like 20 characters with accents, while we really only use like 3 or so. Kinda odd. Then again, perhaps they did that because such characters might occur in names of people, but.. then pretty much anything can be in a name, lol ;). But a nick is a name of course, so HMMM :p. (that said, people have rarely names with accents here) Anyway, I'm just sticking with accent we actually use here / in germany / france / etc. and I guess others can comment on my work later. Funfunfun :P
I was also kinda thinking of something like 'latin' or 'euro' to just allow the accents from all latin european languages (so pretty much all except greece which has a whole different script [uh ok, I don't know about those new-euro countries like latvia and such]), Does that sound like a good idea? :P

codemastr

2005-02-14 21:06

reporter   ~0009141

Yeah, if you really look, there are quite a few European languages that don't fit. Afaik, Romanian doesn't work, nor other eastern European languages. Mostly you're referring to western European languages. http://www.urwpp.de/english/info/info_bel_mac.htm That has some pretty good info on the charset stuff.

syzop

2005-02-19 16:52

administrator   ~0009188

Added in CVS [.270], there could be bugs... so if you find anything, let me know!.

- Added nick character system. This allows you to choose which (additional) characters
  to allow in nicks via set::allowed-nickchars. See unreal32docs.html -> section 3.16
  for a list of available languages and more info on how to use it.
  Current list: dutch, french, german, italian, spanish, euro-west, chinese-trad,
  chinese-simp, chinese-ja, chinese.
  If you wonder why your language is not yet included or why a certain mistake is present,
  then please understand that we are most likely not experienced (at all) in your language.
  If you are a native of your language (or know the language well), and your language
  is not included yet or you have some corrections, then contact syzop@vulnscan.org or
  report it as a bug on http://bugs.unrealircd.org/

Issue History

Date Modified Username Field Change
2004-09-13 08:45 Slimp New Issue
2004-09-13 15:12 syzop View Status public => private
2004-09-13 15:12 syzop Note Added: 0007633
2004-09-13 18:25 codemastr Note Added: 0007643
2004-09-19 23:56 syzop Status new => assigned
2004-09-19 23:56 syzop Assigned To => syzop
2004-09-26 23:09 syzop ETA none => < 1 month
2004-10-06 20:56 syzop OS Linux => ALL
2004-10-06 20:56 syzop OS Version 2.6.5 => ALL
2004-10-06 20:56 syzop Platform => ALL
2004-10-06 20:56 syzop Product Version 3.2 =>
2004-10-06 20:56 syzop Summary Problems for Russian nick => Localized nick support
2004-10-06 20:56 syzop Description Updated
2004-10-06 20:57 syzop Note Added: 0007888
2004-10-08 04:02 syzop Relationship added related to 0000052
2004-10-10 03:10 syzop Note Added: 0007939
2004-10-10 22:44 syzop Relationship deleted related to 0000052
2004-10-10 22:45 syzop Relationship added related to 0000052
2004-10-10 22:45 syzop ETA < 1 month => > 1 month
2005-01-04 20:20 syzop Note Added: 0008699
2005-01-04 20:20 syzop Assigned To syzop =>
2005-01-04 20:20 syzop Status assigned => confirmed
2005-01-04 20:20 syzop View Status private => public
2005-01-04 20:21 syzop Note Edited: 0008699
2005-02-13 20:13 syzop Status confirmed => assigned
2005-02-13 20:13 syzop Assigned To => syzop
2005-02-13 20:21 syzop Note Added: 0009118
2005-02-13 20:43 syzop Note Added: 0009119
2005-02-13 20:43 syzop Note Edited: 0009119
2005-02-13 20:44 syzop Note Edited: 0009119
2005-02-13 21:47 codemastr Note Added: 0009120
2005-02-13 22:03 syzop Note Added: 0009121
2005-02-13 23:40 Stealth Note Added: 0009122
2005-02-14 10:39 syzop Note Added: 0009127
2005-02-14 18:23 codemastr Note Added: 0009139
2005-02-14 21:00 syzop Note Added: 0009140
2005-02-14 21:06 codemastr Note Added: 0009141
2005-02-19 16:52 syzop Status assigned => resolved
2005-02-19 16:52 syzop Fixed in Version => 3.2.3
2005-02-19 16:52 syzop Resolution open => fixed
2005-02-19 16:52 syzop Note Added: 0009188