[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[EP-tech] Re: Antwort: Antwort: Searching fails when database field contains Å (utf8 %c3%85)



Hi,

Thank you!  Works like a charm ?

/Christer

From: eprints-tech-bounces at ecs.soton.ac.uk [mailto:eprints-tech-bounces at ecs.soton.ac.uk] On Behalf Of martin.braendle at id.uzh.ch
Sent: den 18 februari 2016 11:22
To: eprints-tech at ecs.soton.ac.uk
Subject: [EP-tech] Antwort: Antwort: Searching fails when database field contains ? (utf8 %c3%85)


I have added

chr(0x00c5) => 'A',     # ?
chr(0xc385) => 'A',     # ?

to the $EPrints::Index::FREETEXT_CHAR_MAPPING list in Tokenizer.pm, restarted both the web server and indexer process, and reindexed the eprint that contained ?gren as author.

Now Advanced Search works with ?

Kind regards,

Martin

--
Dr. Martin Br?ndle
Zentrale Informatik
Universit?t Z?rich
Stampfenbachstr. 73
CH-8006 Z?rich


[Inactive hide details for martin.braendle---18/02/2016 10:14:46---Hi, we can reproduce the behavior:]martin.braendle---18/02/2016 10:14:46---Hi, we can reproduce the behavior:

Von: martin.braendle at id.uzh.ch<mailto:martin.braendle at id.uzh.ch>
An: eprints-tech at ecs.soton.ac.uk<mailto:eprints-tech at ecs.soton.ac.uk>
Datum: 18/02/2016 10:14
Betreff: [EP-tech] Antwort:  Searching fails when database field contains ? (utf8 %c3%85)
Gesendet von: eprints-tech-bounces at ecs.soton.ac.uk<mailto:eprints-tech-bounces at ecs.soton.ac.uk>

________________________________



Hi,

we can reproduce the behavior:

Advanced search (which goes to the SQL index): ?gren, ?gren, "?gren" and "?gren" all fail

Quick search (which goes to the Xapian index:) both creators_name:?gren and creators_name:?gren find results   (creators_name is the field name we use for authors)

perl_lib/EPrints/Index/Tokenizer.pm contains a translation list that maps Unicode characters to ASCII - ? is missing there. Maybe this is the clue?

Best regards,

Martin

--
Dr. Martin Br?ndle
Zentrale Informatik
Universit?t Z?rich
Stampfenbachstr. 73
CH-8006 Z?rich


[Inactive hide details for Christer Enkvist ---17/02/2016 17:20:34---Hello all! I have encountered a weird UTF-8 related problem]Christer Enkvist ---17/02/2016 17:20:34---Hello all! I have encountered a weird UTF-8 related problem when querying names in the advanced sear

Von: Christer Enkvist <christer.enkvist at slu.se<mailto:christer.enkvist at slu.se>>
An: "eprints-tech at ecs.soton.ac.uk<mailto:eprints-tech at ecs.soton.ac.uk>" <eprints-tech at ecs.soton.ac.uk<mailto:eprints-tech at ecs.soton.ac.uk>>
Datum: 17/02/2016 17:20
Betreff: [EP-tech] Searching fails when database field contains ? (utf8 %c3%85)
Gesendet von: eprints-tech-bounces at ecs.soton.ac.uk<mailto:eprints-tech-bounces at ecs.soton.ac.uk>
________________________________



Hello all!

I have encountered a weird UTF-8 related problem when querying names in the advanced search.  If the name of an author contains ?, like ?ngstr?m, (UTF-8 %c3%85, A with a ring above) then querying will fail.  I have not seen the problem for any other character, e.g. no problem with ??? (a with ring above), %c3%a5, or any other non A-Z letter such as ?,?,?, or ?.  The problem is when the database entry itself contains an ?, which is typically when the character is the first in the name like ?ngstr?m or in a hyphened name like Per-?ke.

Furthermore, if the queryterm contains an ??? then it will fail.  A few examples:

M?rten ? works
m?rten ? works
M?RTEN -- works
M?RTEN -- fails
m?rten -- fails

The query field is (normally) case insensitive so it shouldn?t matter if I write ??ngstr?m? or ??ngstr?m?.  However, hit or miss in this case depends on if the database have an ? and/or the query term contains an ? as it seems like Eprints cannot handle ???.  Always, displays correct and is correctly written into the database.  Only problem is the advanced search.

Should add that querying the database using SQL works without any problems (incl all upper/lower combinations).  Any ideas what may be wrong with Eprints and where to start looking?

Regards,
Christer


Christer Enkvist, Ph D
System Administrator/System Librarian
Division of Scholarly Communication
Swedish University of Agricultural Sciences
Uppsala, Sweden

Telephone: 018-671042
*** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
*** Archive: http://www.eprints.org/tech.php/
*** EPrints community wiki: http://wiki.eprints.org/
*** EPrints developers Forum: http://forum.eprints.org/
*** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
*** Archive: http://www.eprints.org/tech.php/
*** EPrints community wiki: http://wiki.eprints.org/
*** EPrints developers Forum: http://forum.eprints.org/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ecs.soton.ac.uk/pipermail/eprints-tech/attachments/20160218/c0dbf368/attachment-0001.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.gif
Type: image/gif
Size: 105 bytes
Desc: image001.gif
Url : http://mailman.ecs.soton.ac.uk/pipermail/eprints-tech/attachments/20160218/c0dbf368/attachment-0001.gif