[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[EP-tech] Re: Normalize characters for correct sorting



Hi Ian

I probably didn't make myself clear what the real problem is. In English
you don't have the same vowel with and without accent. It is only matter of
correct spelling. So it is the same letter and has to be normalized to be
sorted correctly. If you see Tokenizer.pm
(/perl_lib/EPrints/Index/Tokenizer.pm) does the same for indexing.

Kostas

2015-06-09 10:57 GMT+03:00 Ian Stuart <Ian.Stuart at ed.ac.uk>:

> I suspect this is a Perl problem rather than an EPrints problem..... I
> would expect Perl to sort by Unicode Value (so 0386 before 0391)
>
> On 09/06/15 08:40, pgasinos pgs wrote:
> > Is there any configuration file(s) in Eprints that someone can normalize
> > utf-8 characters so they are sorting correctly in non English languages?
> > For example the Unicode entities: &#0386; GREEK CAPITAL LETTER ALPHA
> > WITH TONOS and
> > &#0391; GREEK CAPITAL LETTER ALPHA are the same and they have to be
> > sorted together, not in separate lists.
> > The vowels are even more complicated. All below, are the same letter and
> > they have to be in the same list:
> > ?    &#965;  GREEK SMALL LETTER UPSILON
> > ?    &#973;  GREEK SMALL LETTER UPSILON WITH TONOS
> > ?    &#971;  GREEK SMALL LETTER UPSILON WITH DIALYTIKA
> > ?    &#944;  GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND TONOS
>
>
> --
>
> Ian Stuart.
> Developer: ORI, RJ-Broker, and OpenDepot.org
> Bibliographics and Multimedia Service Delivery team,
> EDINA,
> The University of Edinburgh.
>
> http://edina.ac.uk/
>
> This email was sent via the University of Edinburgh.
>
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
>
> *** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
> *** Archive: http://www.eprints.org/tech.php/
> *** EPrints community wiki: http://wiki.eprints.org/
> *** EPrints developers Forum: http://forum.eprints.org/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ecs.soton.ac.uk/pipermail/eprints-tech/attachments/20150609/026abf13/attachment.html