EPrints Technical Mailing List Archive

Message: #06895


< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First

[EP-tech] Problem with searching for names starting with Ö


Hi everyone,

We've run into an issue with searching for names containing certain characters and how they are handled by the Tokenizer.pm (https://github.com/eprints/eprints/blob/3.3/perl_lib/EPrints/Index/Tokenizer.pm) module. I notice in the FREETEXT_CHAR_MAPPING that characters are being substituted when indexing takes place or search terms are entered. Many of the substitutions make sense, but some others seem to be done on a phonetic basis? Strangely, this isn't an issue on the simple search form, but if a name is entered in the "Creator" field of the advanced search some strange things can happen.

For example (btw names have been changed!) if an author exists on the system with the surname "Öl", results will not be returned if I search by "Ol" but they will be if I enter "Öl" or, more suprisingly, "Oel" (thanks to the substitution made). 

I understand that in many languages letters such as these are considered to be entirely different characters, but when people search using an English language keyboard they tend to just drop the accents. This has led to a situation where results were not returned in an expected manner. 

Has anyone else encountered this problem? I can change the behaviour by changing the mappings in Tokenizer.pm but that means modifying core code. It also doesn't look to be easily overridable?

Am very interested to hear any thoughts about how to approach this!

Thanks
Liam​

Library Systems Developer
University of Kent