[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[EP-tech] Re: Normalize characters for correct sorting



Ah - OK.... yes, I had a similar problem a few years ago

It looks like 
http://search.cpan.org/~kiz/MathML-Entities-Approximate-0.20/lib/MathML/Entities/Approximate.pm 
should be updated, and it could be used by the Tokenizer :)


On 09/06/15 09:59, pgasinos pgs wrote:
> Hi Ian
>
> I probably didn't make myself clear what the real problem is. In English
> you don't have the same vowel with and without accent. It is only matter
> of correct spelling. So it is the same letter and has to be normalized
> to be sorted correctly. If you see Tokenizer.pm
> (/perl_lib/EPrints/Index/Tokenizer.pm) does the same for indexing.
>
> Kostas
>
> 2015-06-09 10:57 GMT+03:00 Ian Stuart <Ian.Stuart at ed.ac.uk
> <mailto:Ian.Stuart at ed.ac.uk>>:
>
>     I suspect this is a Perl problem rather than an EPrints problem..... I
>     would expect Perl to sort by Unicode Value (so 0386 before 0391)
>
>     On 09/06/15 08:40, pgasinos pgs wrote:
>      > Is there any configuration file(s) in Eprints that someone can
>     normalize
>      > utf-8 characters so they are sorting correctly in non English
>     languages?
>      > For example the Unicode entities: &#0386; GREEK CAPITAL LETTER ALPHA
>      > WITH TONOS and
>      > &#0391; GREEK CAPITAL LETTER ALPHA are the same and they have to be
>      > sorted together, not in separate lists.
>      > The vowels are even more complicated. All below, are the same
>     letter and
>      > they have to be in the same list:
>      > ?    &#965;  GREEK SMALL LETTER UPSILON
>      > ?    &#973;  GREEK SMALL LETTER UPSILON WITH TONOS
>      > ?    &#971;  GREEK SMALL LETTER UPSILON WITH DIALYTIKA
>      > ?    &#944;  GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND TONOS


-- 

Ian Stuart.
Developer: ORI, RJ-Broker, and OpenDepot.org
Bibliographics and Multimedia Service Delivery team,
EDINA,
The University of Edinburgh.

http://edina.ac.uk/

This email was sent via the University of Edinburgh.

The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.