Tech List

[index] [prev] [next] [options] [help]
See the Mailing Lists Page for how to subscribe and unsubscribe.

eprints_tech messages

Please note: this page shows emails that have been sent to the eprints_tech mailing list. Some of these may be spam emails we have failed to filter.

Re: [EP-tech] either me or Eprints is missing on utf8 - bug/feature request

From: Tim Brody <tdb01r AT ecs.soton.ac.uk>
Date: Mon, 12 May 2008 11:54:47 +0100


Threading: Re: [EP-tech] either me or Eprints is missing on utf8 - bug/feature request from tdb01r AT ecs.soton.ac.uk
      • This Message

*** http://www.eprints.org/tech.php/id/%3C482821F7.1000700%40ecs.soton.ac.uk%3E
*** EPrints community wiki - http://wiki.eprints.org/

Roman Chyla wrote:
> *** 
http://www.eprints.org/tech.php/id/%3Cea0115e90805120306kd7ea332ic8531c2d110bf695%40mail.gmail.com%3E
> *** EPrints community wiki - http://wiki.eprints.org/
>
> Thank you Tim,
>
> but how can the community live without Unicode? How can they search
> for unicode strings? It is very expensive to use own sorting routines
> when the database can do it faster and better. I cannot do without
> unicode and I suppose hundreds of thousands sites out there neither.
> If we can provide mappings for metadata fields, we cannot deal with
> all that possible variantions coming from the fulltext - that is a
> lost fight.
>
> My EPrints installation is going fine with unicode, but indexing is
> stripping off unicode strings (searching works well). I guess I am on
> my own here to fix it...
>
> Please, register this as a serious feature request - storing unicode
> strings as latin1 is not the same as having full unicode support. And
> it is so easy to switch to unicode, actually, it will not cost
> anything compared to benefits.
>   
What are you trying to do that EPrints doesn't do?

Internationalisation and localisation are handled internally by EPrints. 
Strictly the database is being asked to store data as binary, rather 
than "latin-1".

I suspect indexing is always going to be EPrints-specific, because you 
will want to expand something like:
Völker to {Völker,Volker,Voelker}

At the moment the ordervalues_* tables are used by searches. You could 
change their character set to utf-8 and the collation to the appropriate 
language-specific collation. But the ordering on views is handled 
internally by EPrints.

Doing something a bit smarter using the database collations may be 
possible with 3.1.

Cheers,
Tim.


[index] [prev] [next] [options] [help]