![]() |
|
Post Reply ![]() |
Author | |
Fassbinder ![]() Forum Senior Member ![]() ![]() VIP Member Joined: May 27 2006 Location: My world Status: Offline Points: 3497 |
![]() Posted: February 05 2007 at 15:49 |
It seems that there may be some problems with diacritics.
For example, If you click on the letter "P" in the row of "Recordings" at the front page, the first album beginning with "P" will be Půlnoční myš by The Plastic People of the Universe. Apparently, it shouldn't be the first one, since the second letter of the title is a diacriticised "u". It means that the system doesn't recognise the letter.
Another example: there was an attempt to change the "regular" spelling of the name of Czeslaw Niemen -- "Czesław". It was done successfully in all areas except for the page of artists/bands (letter "N"). As known, all the letter there appear in upper case, but the system failed to augment the lower case "ł" into upper case "Ł".
I'm sure there are additional examples.
After all, not a big deal, but, for the sake of consistency (standing here for an euphemism of "pedantism")...
|
|
![]() |
|
Easy Livin ![]() Special Collaborator ![]() ![]() Honorary Collaborator / Retired Admin Joined: February 21 2004 Location: Scotland Status: Offline Points: 15585 |
![]() |
Where should the PP of the U album appear? Is the second letter to be taken as a "u"? Is it perhaps better is non-English characters are separated out, rather than being taken to be the letter they look most like?
|
|
![]() |
|
Atkingani ![]() Special Collaborator ![]() ![]() Honorary Collaborator / Retired Admin Joined: October 21 2005 Location: Terra Brasilis Status: Offline Points: 12288 |
![]() |
IMO the diacritic should not alter the position of the vowel or consonant. If it's "ű" or "ú" or "ů", it's always "u" and the next letter will decide its position in the alphabetical order. |
|
Guigo
~~~~~~ |
|
![]() |
|
Raff ![]() Special Collaborator ![]() ![]() Honorary Collaborator Joined: July 29 2005 Location: None Status: Online Points: 24429 |
![]() |
In English or Italian it is indeed like that.. However, in Finnish "ä" and "ö" come after "z" in the alphabet. Funny, isn't it?
![]() |
|
![]() |
|
Atkingani ![]() Special Collaborator ![]() ![]() Honorary Collaborator / Retired Admin Joined: October 21 2005 Location: Terra Brasilis Status: Offline Points: 12288 |
![]() |
Portuguese and French follow the same rule for Italian and English too... Spanish puts 'ń' after 'nz' but in this case I believe that the majority (or the common sense) may prevail. ![]() Edited by Atkingani - February 10 2007 at 12:17 |
|
Guigo
~~~~~~ |
|
![]() |
|
Joolz ![]() Special Collaborator ![]() Honorary Collaborator Joined: March 24 2006 Location: United Kingdom Status: Offline Points: 1377 |
![]() |
I agree ... 'u' is still 'u' whether it has diacritics or not I would think .... The problem seems to be that the system recognizes some accents and not others, eg as Fassbinder mentioned, it doesn't recognize the Polish 'ł' so it simply leaves it as an untranslated character when the system capitalizes the name [we have resorted to the conventional 'l' for Czesław Nieman until it can be sorted]. edit: oops, I'd put the wrong quote ![]() Edited by Joolz - February 10 2007 at 12:15 |
|
![]() |
|
Easy Livin ![]() Special Collaborator ![]() ![]() Honorary Collaborator / Retired Admin Joined: February 21 2004 Location: Scotland Status: Offline Points: 15585 |
![]() |
Would it be possible to identify the languages with such characters which the site seems to support, and those which it does not? We've made a good start above.
|
|
![]() |
|
Fassbinder ![]() Forum Senior Member ![]() ![]() VIP Member Joined: May 27 2006 Location: My world Status: Offline Points: 3497 |
![]() |
As a "main PA specialist on diacritics" (he-he-he-he-he...) I'll try to figure out which diacriticised letters are recognised by the system as diacriticised ones and which are not (i.e, those which are considered to be "independent" letters by the system). It'll take some time, however.
|
|
![]() |
|
Easy Livin ![]() Special Collaborator ![]() ![]() Honorary Collaborator / Retired Admin Joined: February 21 2004 Location: Scotland Status: Offline Points: 15585 |
![]() |
Cheers Fassbinder, that would be great!
|
|
![]() |
|
Fassbinder ![]() Forum Senior Member ![]() ![]() VIP Member Joined: May 27 2006 Location: My world Status: Offline Points: 3497 |
![]() |
First of all, I think that the problems are with some specific letters, not with whole languages (by "languages" I mean here their alphabets, obviously).
Then, the question may be split into two section: the first one is converting the lower case letters into upper case ones, whereas the second is a pure recognising a letter as a diacriticised variant of a regular letter by the system.
The example with Czeslaw Niemen is the example of non-converting. The system does recognise the lower case letter, but is unable to convert it into the upper case one.
The example with Půlnoční myš is the example non-recognising a kind of "u" in "ů". Another example of that was brought by Joolz in another thread:
This means that, instead of recognising in diacriticised letters the diacriticised variants of regular letters, the system considers them symbols. Symbols are always put in the very beginning or the very end of any list, i.e. either before or after the regular letters which are recognised as letters. The problems of symbols and unrecognised letters seem to be somehow related. Please, pay attention also to this thread: www.progarchives.com/forum/forum_posts.asp?TID=33763 ; it deals with the problems of search by symbols, but remained somehow overlooked by many.
That said, I'll continue to search for other examples of non-recognising of diacritics by the system.
|
|
![]() |
Post Reply ![]() |
|
Forum Jump | Forum Permissions ![]() You cannot post new topics in this forum You cannot reply to topics in this forum You cannot delete your posts in this forum You cannot edit your posts in this forum You cannot create polls in this forum You cannot vote in polls in this forum |