View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0002882 | unreal | ircd | public | 2006-04-13 14:47 | 2010-07-14 17:49 |
Reporter | Trocotronic | Assigned To | |||
Priority | normal | Severity | feature | Reproducibility | always |
Status | acknowledged | Resolution | open | ||
Platform | AMD K6 32bits | OS | Windows XP Professional | OS Version | SP2 |
Product Version | 3.2.5 | ||||
Summary | 0002882: Configurable CASEMAPPING (Lower/Uppercase with charsets) | ||||
Description | I know that charsets for diferent languages is very complex. I have loaded spanish and catalan charsets. For example, á is lowercase of Á. So, eáe and eÁe are the same word. Yes, to distinguish lower and upper letters for every charset sounds too waste. But I think it could be possible. | ||||
Tags | No tags attached. | ||||
Attached Files | |||||
3rd party modules | |||||
|
This has been mentioned many times, and will not be fixed. It even describes this in the documentation: [quote]NOTE 2: Casemapping (if a certain lowercase character belongs to an upper one) is done according to US-ASCII, this means that o" and O" are not recognized as 'the same character' and hence someone can have a nick with B"ar and someone else BA"r at the same time. This is a limitation of the current system and IRCd standards that cannot be solved anytime soon. People should be aware of this limitation. Note that this limitation has always also been applied to channels, in which nearly all characters were always permitted and US-ASCII casemapping was always performed.[/quote] |
|
Standards, standards, standards... the better improve of unreal is its non-standaring. You will agree with me that if unreal supports or enchances this feature, will be a better ircd. Why cannot you break this rule? Is there any special reason for accept US-ASCII as the unique alternative? |
|
/me send patch to syzop about it. He say, that at 3.3 maybe it be. |
|
Bock, could you upload your patch, please? Thank you. |
|
ftp://ftp.bynets.org/sources/unreal3.2.4-bynets.diff this patch add support of different locales in file mode. You can change locale without server rebuild, just by rehash command. |
|
yep, this is it. I hope, that this patch will be in 3.3*. On our network for 180 days never be bug or crash server with this patch. |
|
It's clean version of patch to current version cvs (2006-06-05). Fixes trouble for Russian and Belarussian lower/upper issues. Read reame.txt in locale/ for your locale. PS: It worked on our network (ByNets) since 2006-02 and no bugs/crash not found. |
|
I added file (Locedit.zip) which conteins GUI editor for locales, patch for current unrealircd (3.2.5-rc3) and directory with locales. For example you may view to belarussian-w1251 or russian-w1251 files for understanding principle. Author (Killer{R} - [email protected])) say, that trouble will be (maybe) with multibyte codes (files), but if it be, he may to correct this (chinese for example). It fixes trouble with namechannels too. Advantage of this patch - add and reload locales file without recompilation and restarting ircd, only rehash. Since 02.2006 - no errors, crashes, etc. See you :] |
|
The fixed version of locedit (supported multibyte too) with sources. |
|
Note that the patch posted by Spider84 not only adds a support for the locale accents uppercase matching, but add new modules (with other totally unrelated functions), and some ByNets-specific thingies. :) |
|
2 avenger - yes, it's patch to our network, clean version is listed below. :] |
|
I've linked a couple of bugids to this one. Renamed this title to 'Configurable CASEMAPPING (Lower/Uppercase with charsets)', since that's what it is... What is CASEMAPPING? CASEMAPPING decides which characters "belong to each others", or in other words... which upcase character belong to which lowcase character. http://www.irc.org/tech_docs/005.html ctrl+f CASEMAPPING We currently always use 'ascii', which is what everyone is familiar with I guess. The idea is to make this a configurable option in the conf to set it to an alternative CASEMAPPING. This one will then be used, and will be properly announced in 005 etc. What are the limitations? You can only have ONE casemapping configured (eg 'ascii' or 'some-latin1-thingy'. You cannot do casemapping for both a russian charset, an hebrew charset, and some eastern european charset... Why not? Because the same character mean different things in each charset. This is why bug 0002987 was closed, because some people don't seem to understand that. What IS possible? It's possible to configure a different CASEMAPPING, for example iso8859-1 (latin1), this will then be used for comparing if things are "the same", such as nicks and channels. This COULD also be used by something like spamfilter (basically any strcasecmp/stricmp in our code) which can be open to debate whether that's a good idea or bad (I currently don't see how it could be bad, but perhaps someone can tell). I don't know if TRE supports it, but it would make sense if it would. As for which technique to use, I haven't looked into it. So maybe it could be discussed here... Something like setlocale() seems to make most sense? What are the disadvantages/advantages of each approach? |
|
When we're at it, it's worth mentioning that in some character sets like russian, some characters like the 'a' will look very similar (or exactly the same) like the latin (western) 'a'. Not sure if something like that could also be resolved, and if we should even bother to do so... Some people argue that should be handled client-side. There have been written various papers on this, see also the discussion 1/2/3 years back when international domain names where introduced. It's not that differently than 'l' vs 'I', and such things, which look very similar in some fonts like Fixedsys, and I haven't ever heard someone talking about comparing 'l' as if it was equal to 'I' :P. Then again, they still look *similar* and not 100% *equal* :P. Again, is that our problem, or is it the problem of the client / font / etc? |
|
For my opinion, some letters edentical (for ex. russian: "e, T, P(it's "R" russian), A, B (like "V" russian), O, and etc.etc.etc) and now many fonts not look different for different language (I see only in some *nix system, on windows system - verdana, fixedys, lucida console etc.: no different of language). Some letters in BIG (like B) looks like big letter V (russian), but little - no. In patch, that I send you, and locedit-fixed.rar - you can create file of locale (now present russian, belorussian, maybe now will be ukranian) with casemappping AND resolving trouble with similar letters. I want to find people from other countries to make files locales. 1 year our network works with this patch and it work fine. Peoples, who find me about this and who testing it - noone bad report or so...Only gratitude.. If you don't agree with it - from patch you can take idea about add locales file to ircd without recompiling AND restarting ircd (it's about dynamical add language to ircd). About badwords and spamfilter... To this question. If people start spamming, usual they don't change CaseSensitive and spamfilter works fine (I'm about russian spam or "happy letters"). [quote] It's not that differently than 'l' vs 'I', and such things, which look very similar in some fonts like Fixedsys, and I haven't ever heard someone talking about comparing 'l' as if it was equal to 'I' :P. [/quote] If I give to you see "E" and "A" russian, you don't see differents :] |
|
hm.. I think, that in frases about ONLY CASEMAPPING reason.. |
Date Modified | Username | Field | Change |
---|---|---|---|
2006-04-13 14:47 | Trocotronic | New Issue | |
2006-04-13 19:03 | Stealth | Note Added: 0011548 | |
2006-04-13 19:18 | Trocotronic | Note Added: 0011549 | |
2006-04-19 08:05 | Bock | Note Added: 0011580 | |
2006-04-20 16:10 | Trocotronic | Note Added: 0011589 | |
2006-05-19 12:10 | Spider84 | Note Added: 0011741 | |
2006-05-19 16:16 | Bock | Note Added: 0011742 | |
2006-06-05 03:35 | Bock | Note Added: 0011854 | |
2006-06-05 03:35 | Bock | File Added: unreal.3.2.5-locale.tar.gz | |
2006-06-11 09:10 | Bock | Note Added: 0011944 | |
2006-06-11 09:18 | Bock | File Added: Locedit.zip | |
2006-06-11 12:38 | Bock | File Added: locedit-fixed.rar | |
2006-06-11 12:38 | Bock | Note Added: 0011945 | |
2006-06-13 08:30 | avenger | Note Added: 0011950 | |
2006-06-13 09:23 | Bock | Note Added: 0011951 | |
2006-11-01 07:33 | syzop | Relationship added | related to 0003101 |
2006-11-01 07:33 | syzop | Relationship added | related to 0002739 |
2006-11-01 07:39 | syzop | Relationship deleted | related to 0003101 |
2006-11-01 07:39 | syzop | Relationship added | has duplicate 0003101 |
2006-11-01 07:49 | syzop | Note Added: 0012540 | |
2006-11-01 07:49 | syzop | Summary | Lower/Uppercase with charsets => Configurable CASEMAPPING (Lower/Uppercase with charsets) |
2006-11-01 07:56 | syzop | Note Added: 0012541 | |
2006-11-01 11:14 | Bock | Note Added: 0012546 | |
2006-11-03 13:40 | syzop | Relationship added | related to 0002718 |
2006-11-04 14:24 | Bock | Note Added: 0012584 | |
2007-04-19 18:37 |
|
Status | new => acknowledged |
2007-04-27 05:50 |
|
Relationship added | related to 0002589 |
2010-07-14 17:49 | syzop | QA | => Not touched yet by developer |
2010-07-14 17:49 | syzop | U4: Need for upstream patch | => No need for upstream InspIRCd patch |
2010-07-14 17:49 | syzop | U4: Upstream notification of bug | => Not decided |
2010-07-14 17:49 | syzop | U4: Contributor working on this | => None |
2010-07-14 17:49 | syzop | Severity | minor => feature |