[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[EP-tech] Re: Normalize characters for correct sorting
Ah - OK.... yes, I had a similar problem a few years ago
It looks like
should be updated, and it could be used by the Tokenizer :)
On 09/06/15 09:59, pgasinos pgs wrote:
> Hi Ian
> I probably didn't make myself clear what the real problem is. In English
> you don't have the same vowel with and without accent. It is only matter
> of correct spelling. So it is the same letter and has to be normalized
> to be sorted correctly. If you see Tokenizer.pm
> (/perl_lib/EPrints/Index/Tokenizer.pm) does the same for indexing.
> 2015-06-09 10:57 GMT+03:00 Ian Stuart <Ian.Stuart at ed.ac.uk
> <mailto:Ian.Stuart at ed.ac.uk>>:
> I suspect this is a Perl problem rather than an EPrints problem..... I
> would expect Perl to sort by Unicode Value (so 0386 before 0391)
> On 09/06/15 08:40, pgasinos pgs wrote:
> > Is there any configuration file(s) in Eprints that someone can
> > utf-8 characters so they are sorting correctly in non English
> > For example the Unicode entities: Ƃ GREEK CAPITAL LETTER ALPHA
> > WITH TONOS and
> > Ƈ GREEK CAPITAL LETTER ALPHA are the same and they have to be
> > sorted together, not in separate lists.
> > The vowels are even more complicated. All below, are the same
> letter and
> > they have to be in the same list:
> > ? υ GREEK SMALL LETTER UPSILON
> > ? ύ GREEK SMALL LETTER UPSILON WITH TONOS
> > ? ϋ GREEK SMALL LETTER UPSILON WITH DIALYTIKA
> > ? ΰ GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND TONOS
Developer: ORI, RJ-Broker, and OpenDepot.org
Bibliographics and Multimedia Service Delivery team,
The University of Edinburgh.
This email was sent via the University of Edinburgh.
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.