Tech List

[index] [prev] [next] [options] [help]
See the Mailing Lists Page for how to subscribe and unsubscribe.

eprints_tech messages

Please note: this page shows emails that have been sent to the eprints_tech mailing list. Some of these may be spam emails we have failed to filter.

Re: [EP-tech] either me or Eprints is missing on utf8 - bug/feature request

From: "Roman Chyla" <roman.chyla AT gmail.com>
Date: Mon, 12 May 2008 12:06:20 +0200


Threading: [EP-tech] either me or Eprints is missing on utf8 - bug/feature request from roman.chyla AT gmail.com
      • This Message

*** 
http://www.eprints.org/tech.php/id/%3Cea0115e90805120306kd7ea332ic8531c2d110bf695%40mail.gmail.com%3E
*** EPrints community wiki - http://wiki.eprints.org/

Thank you Tim,

but how can the community live without Unicode? How can they search
for unicode strings? It is very expensive to use own sorting routines
when the database can do it faster and better. I cannot do without
unicode and I suppose hundreds of thousands sites out there neither.
If we can provide mappings for metadata fields, we cannot deal with
all that possible variantions coming from the fulltext - that is a
lost fight.

My EPrints installation is going fine with unicode, but indexing is
stripping off unicode strings (searching works well). I guess I am on
my own here to fix it...

Please, register this as a serious feature request - storing unicode
strings as latin1 is not the same as having full unicode support. And
it is so easy to switch to unicode, actually, it will not cost
anything compared to benefits.

Best,

  roman

On Mon, May 12, 2008 at 11:26 AM, Tim Brody <tdb01r AT ecs.soton.ac.uk> 
wrote:
> ***
> 
http://www.eprints.org/tech.php/id/%3C48280D3D.5060900%40ecs.soton.ac.uk%3E
>  *** EPrints community wiki - http://wiki.eprints.org/
>
>  EPrints doesn't expect the database to be in Unicode (or any other
> encoding).
>
>  The theory is that if you want a sorting other than in English you will
> write a custom method for your language and use it in the
> "make_value_orderkey" property on the fields that aren't in 
English.
>
>  This property is briefly documented at:
>  http://wiki.eprints.org/w/Metadata
>
>  All the best,
>  Tim.
>
>  Roman Chyla wrote:
>
> > ***
> 
http://www.eprints.org/tech.php/id/%3Cea0115e90805101541v542d2c1o3ad4f731703bfba9%40mail.gmail.com%3E
> > *** EPrints community wiki - http://wiki.eprints.org/
> >
> >
> >
> >
> > Hello,
> > Excuse my premature senility, some things (serious ones) are not 
clear
> > to me. I have successfully converted my database to utf8, fighting
> > with several issues and found (possibly) a bug
> >
> > Firstly, one cannot have default collation set for the database like 
this:
> > Alter database eprints3 character set utf8 collate utf8_czech_ci;
> >
> > because this will happen
> > DBD::mysql::st execute failed: Illegal mix of collations
> > (utf8_general_ci,IMPLICIT) and (utf8_czech_ci,IMPLICIT) for operation
> > '=' at /opt/eprints3/perl_lib/EPrints/Database.pm line 2363.
> > SQL ERROR (execute): SELECT M.subjectid, M.pos, M.ancestors, C.pos
> > FROM cache5960 AS C, subject_ancestors AS M WHERE M.subjectid =
> > C.subjectid AND C.pos>0 ORDER BY C.pos
> > SQL ERROR (execute): Illegal mix of collations
> > (utf8_general_ci,IMPLICIT) and (utf8_czech_ci,IMPLICIT) for operation
> > '='
> > DBD::mysql::st fetchrow_array failed: fetch() without execute() at
> > /opt/eprints3/perl_lib/EPrints/Database.pm line 2073.
> >
> >
> > I have also converted all my data to utf8 (and I am sure they are
> > correct in the database).
> > But Eprints will start to complain that there is a wrong encoding
> >
> > I can fix it
> > $self->do('SET NAMES utf8');
> >
> > in the Database.pm, when instances is created. And everything is 
fine.
> >
> > But this should not be necessary (?) Am I missing something? Or are
> > all the archives of EPrints storing utf8 as latin1 internally in the
> > databases? (and as somebody reported, proper sorting does not work).
> > Shall I install new version of EPrints? Please give me some 
reasonable
> > answers, it can't be EPrints, it must be me...
> >
> > Thanks,
> >
> >
> > roman
> >
> >
> > here is the convert how-to, I will eventually put it in the wiki (it
> > depends on your answers)
> >
> >
> > #dump schema of the database
> > mysqldump --no-data --set-charset -u root -p<password> 
<db_name> >
> schema.sql
> >
> > #dump the data, it will be actually utf8 encoded, don't be fooled be
> > the charset latin1 bit
> > mysqldump --no-create-info --skip-set-charset -u root 
-p<yourpassword>
> > --default-character-set=latin1 <db_name> > data.sql
> >
> > #open the schema.sql in an editor and:
> > - replace all occurences of CHARSET=latin1 for CHARSET=utf8
> > - also change the dafault NULL charset for columns (see
> > http://bugs.mysql.com/bug.php?id=23073)
> > -- search for "varchar(255)" and replace "with 
varchar(255) CHARACTER SET
> utf8 "
> >
> > #set the utf encoding for the data
> >
> > in linux you can do: echo 'SET NAMES utf8;' | cat - data.sql > 
datautf.sql
> >
> > #now load the edited db schema  (this will recreate the database, AND
> > DESTROY ALL THE DATA!!! - make sure you have them in datautf.sql)
> > mysql <db_name> -u root -p < schema.sql
> >
> > #load the data
> > mysql <db_name> -u root -p < datautf.sql
> >
> > ----
> > now you are done - if you want to set the default encoding for the
> > database, but thats useful only for newly created tables (and might 
be
> > better to set charset globally, for the whole server) you can issue
> > alter database <db_name> character set utf8 collate;
> >
> >
> >
>
>


[index] [prev] [next] [options] [help]