Forum Home Forum Home > Site News, Newbies, Help and Improvements > Report bugs here
  New Posts New Posts RSS Feed - Diacritics
  FAQ FAQ  Forum Search   Events   Register Register  Login Login

Topic ClosedDiacritics

 Post Reply Post Reply
Author
Message
Fassbinder View Drop Down
Forum Senior Member
Forum Senior Member
Avatar
VIP Member

Joined: May 27 2006
Location: My world
Status: Offline
Points: 3497
Direct Link To This Post Topic: Diacritics
    Posted: February 05 2007 at 15:49
It seems that there may be some problems with diacritics.
 
For example, If you click on the letter "P" in the row of "Recordings" at the front page, the first album beginning with "P" will be Půlnoční myš by The Plastic People of the Universe. Apparently, it shouldn't be the first one, since the second letter of the title is a diacriticised "u". It means that the system doesn't recognise the letter.
 
Another example: there was an attempt to change the "regular" spelling of the name of Czeslaw Niemen -- "Czesław". It was done successfully in all areas except for the page of artists/bands (letter "N"). As known, all the letter there appear in upper case, but the system failed to augment the lower case "ł" into upper case "Ł".
 
I'm sure there are additional examples.
 
After all, not a big deal, but, for the sake of consistency (standing here for an euphemism of "pedantism")...
Back to Top
Easy Livin View Drop Down
Special Collaborator
Special Collaborator
Avatar
Honorary Collaborator / Retired Admin

Joined: February 21 2004
Location: Scotland
Status: Offline
Points: 15585
Direct Link To This Post Posted: February 10 2007 at 11:56
Where should the PP of the U album appear? Is the second letter to be taken as a "u"? Is it perhaps better is non-English characters are separated out, rather than being taken to be the letter they look most like?
Back to Top
Atkingani View Drop Down
Special Collaborator
Special Collaborator
Avatar
Honorary Collaborator / Retired Admin

Joined: October 21 2005
Location: Terra Brasilis
Status: Offline
Points: 12288
Direct Link To This Post Posted: February 10 2007 at 12:02

IMO the diacritic should not alter the position of the vowel or consonant. If it's "ű" or "ú" or "ů", it's always "u" and the next letter will decide its position in the alphabetical order.

Guigo

~~~~~~
Back to Top
Raff View Drop Down
Special Collaborator
Special Collaborator
Avatar
Honorary Collaborator

Joined: July 29 2005
Location: None
Status: Online
Points: 24429
Direct Link To This Post Posted: February 10 2007 at 12:09
In English or Italian it is indeed like that.. However, in Finnish "ä" and "ö" come after "z" in the alphabet. Funny, isn't it?Wink
Back to Top
Atkingani View Drop Down
Special Collaborator
Special Collaborator
Avatar
Honorary Collaborator / Retired Admin

Joined: October 21 2005
Location: Terra Brasilis
Status: Offline
Points: 12288
Direct Link To This Post Posted: February 10 2007 at 12:13
Originally posted by Ghost Rider Ghost Rider wrote:

In English or Italian it is indeed like that.. However, in Finnish "ä" and "ö" come after "z" in the alphabet. Funny, isn't it?Wink
 
Portuguese and French follow the same rule for Italian and English too... Spanish puts 'ń' after 'nz' but in this case I believe that the majority (or the common sense) may prevail. Wink 


Edited by Atkingani - February 10 2007 at 12:17
Guigo

~~~~~~
Back to Top
Joolz View Drop Down
Special Collaborator
Special Collaborator

Honorary Collaborator

Joined: March 24 2006
Location: United Kingdom
Status: Offline
Points: 1377
Direct Link To This Post Posted: February 10 2007 at 12:14
Originally posted by Atkingani Atkingani wrote:

IMO the diacritic should not alter the position of the vowel or consonant. If it's "ű" or "ú" or "ů", it's always "u" and the next letter will decide its position in the alphabetical order.


I agree ... 'u' is still 'u' whether it has diacritics or not I would think ....

The problem seems to be that the system recognizes some accents and not others, eg as Fassbinder mentioned, it doesn't recognize the Polish 'ł' so it simply leaves it as an untranslated character when the system capitalizes the name [we have resorted to the conventional 'l' for Czesław Nieman until it can be sorted].

edit: oops, I'd put the wrong quote  Embarrassed


Edited by Joolz - February 10 2007 at 12:15
Back to Top
Easy Livin View Drop Down
Special Collaborator
Special Collaborator
Avatar
Honorary Collaborator / Retired Admin

Joined: February 21 2004
Location: Scotland
Status: Offline
Points: 15585
Direct Link To This Post Posted: February 10 2007 at 13:46
Would it be possible to identify the languages with such characters which the site seems to support, and those which it does not? We've made a good start above.
Back to Top
Fassbinder View Drop Down
Forum Senior Member
Forum Senior Member
Avatar
VIP Member

Joined: May 27 2006
Location: My world
Status: Offline
Points: 3497
Direct Link To This Post Posted: February 10 2007 at 18:52
As a "main PA specialist on diacritics" (he-he-he-he-he...) I'll try to figure out which diacriticised letters are recognised by the system as diacriticised ones and which are not (i.e, those which are considered to be "independent" letters by the system). It'll take some time, however.
Back to Top
Easy Livin View Drop Down
Special Collaborator
Special Collaborator
Avatar
Honorary Collaborator / Retired Admin

Joined: February 21 2004
Location: Scotland
Status: Offline
Points: 15585
Direct Link To This Post Posted: February 11 2007 at 11:44
Cheers Fassbinder, that would be great!
Back to Top
Fassbinder View Drop Down
Forum Senior Member
Forum Senior Member
Avatar
VIP Member

Joined: May 27 2006
Location: My world
Status: Offline
Points: 3497
Direct Link To This Post Posted: February 11 2007 at 12:29
First of all, I think that the problems are with some specific letters, not with whole languages (by "languages" I mean here their alphabets, obviously).
 
Then, the question may be split into two section: the first one is converting the lower case letters into upper case ones, whereas the second is a pure recognising a letter as a diacriticised variant of a regular letter by the system.
 
The example with Czeslaw Niemen is the example of non-converting. The system does recognise the lower case letter, but is unable to convert it into the upper case one.
 
The example with Půlnoční myš is the example non-recognising a kind of "u" in "ů". Another example of that was brought by Joolz in another thread:
Originally posted by Joolz Joolz wrote:

If you glance further down the list ...


The system doesn't always recognize diacritics properly.
 
This means that, instead of recognising in diacriticised letters the diacriticised variants of regular letters, the system considers them symbols. Symbols are always put in the very beginning or the very end of any list, i.e. either before or after the regular letters which are recognised as letters. The problems of symbols and unrecognised letters seem to be somehow related. Please, pay attention also to this thread: www.progarchives.com/forum/forum_posts.asp?TID=33763 ; it deals with the problems of search by symbols, but remained somehow overlooked by many.
 
That said, I'll continue to search for other examples of non-recognising of diacritics by the system.
 
 
Back to Top
 Post Reply Post Reply

Forum Jump Forum Permissions View Drop Down



This page was generated in 0.570 seconds.
Donate monthly and keep PA fast-loading and ad-free forever.