EPrints Technical Mailing List Archive

Message: #04297


< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First

[EP-tech] Re: Normalize characters for correct sorting


Hi Ian

I probably didn't make myself clear what the real problem is. In English you don't have the same vowel with and without accent. It is only matter of correct spelling. So it is the same letter and has to be normalized to be sorted correctly. If you see Tokenizer.pm (/perl_lib/EPrints/Index/Tokenizer.pm) does the same for indexing.

Kostas

2015-06-09 10:57 GMT+03:00 Ian Stuart <Ian.Stuart@ed.ac.uk>:
I suspect this is a Perl problem rather than an EPrints problem..... I
would expect Perl to sort by Unicode Value (so 0386 before 0391)

On 09/06/15 08:40, pgasinos pgs wrote:
> Is there any configuration file(s) in Eprints that someone can normalize
> utf-8 characters so they are sorting correctly in non English languages?
> For example the Unicode entities: &#0386; GREEK CAPITAL LETTER ALPHA
> WITH TONOS and
> &#0391; GREEK CAPITAL LETTER ALPHA are the same and they have to be
> sorted together, not in separate lists.
> The vowels are even more complicated. All below, are the same letter and
> they have to be in the same list:
> υ    &#965;  GREEK SMALL LETTER UPSILON
> ύ    &#973;  GREEK SMALL LETTER UPSILON WITH TONOS
> ϋ    &#971;  GREEK SMALL LETTER UPSILON WITH DIALYTIKA
> ΰ    &#944;  GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND TONOS


--

Ian Stuart.
Developer: ORI, RJ-Broker, and OpenDepot.org
Bibliographics and Multimedia Service Delivery team,
EDINA,
The University of Edinburgh.

http://edina.ac.uk/

This email was sent via the University of Edinburgh.

The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


*** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
*** Archive: http://www.eprints.org/tech.php/
*** EPrints community wiki: http://wiki.eprints.org/
*** EPrints developers Forum: http://forum.eprints.org/