Tech List

[index] [prev] [next] [options] [help]
See the Mailing Lists Page for how to subscribe and unsubscribe.

eprints_tech messages

Please note: this page shows emails that have been sent to the eprints_tech mailing list. Some of these may be spam emails we have failed to filter.

Re: [EP-tech] either me or Eprints is missing on utf8 - bug/feature request

From: Tim Brody <tdb01r AT ecs.soton.ac.uk>
Date: Mon, 12 May 2008 10:26:21 +0100


Threading: [EP-tech] either me or Eprints is missing on utf8 - bug/feature request from roman.chyla AT gmail.com
      • This Message
             Re: [EP-tech] either me or Eprints is missing on utf8 - bug/feature request from tdb01r AT ecs.soton.ac.uk

*** http://www.eprints.org/tech.php/id/%3C48280D3D.5060900%40ecs.soton.ac.uk%3E
*** EPrints community wiki - http://wiki.eprints.org/

EPrints doesn't expect the database to be in Unicode (or any other 
encoding).

The theory is that if you want a sorting other than in English you will 
write a custom method for your language and use it in the 
"make_value_orderkey" property on the fields that aren't in English.

This property is briefly documented at:
http://wiki.eprints.org/w/Metadata

All the best,
Tim.

Roman Chyla wrote:
> *** 
http://www.eprints.org/tech.php/id/%3Cea0115e90805101541v542d2c1o3ad4f731703bfba9%40mail.gmail.com%3E
> *** EPrints community wiki - http://wiki.eprints.org/
>
> Hello,
> Excuse my premature senility, some things (serious ones) are not clear
> to me. I have successfully converted my database to utf8, fighting
> with several issues and found (possibly) a bug
>
> Firstly, one cannot have default collation set for the database like this:
> Alter database eprints3 character set utf8 collate utf8_czech_ci;
>
> because this will happen
> DBD::mysql::st execute failed: Illegal mix of collations
> (utf8_general_ci,IMPLICIT) and (utf8_czech_ci,IMPLICIT) for operation
> '=' at /opt/eprints3/perl_lib/EPrints/Database.pm line 2363.
> SQL ERROR (execute): SELECT M.subjectid, M.pos, M.ancestors, C.pos
> FROM cache5960 AS C, subject_ancestors AS M WHERE M.subjectid =
> C.subjectid AND C.pos>0 ORDER BY C.pos
> SQL ERROR (execute): Illegal mix of collations
> (utf8_general_ci,IMPLICIT) and (utf8_czech_ci,IMPLICIT) for operation
> '='
> DBD::mysql::st fetchrow_array failed: fetch() without execute() at
> /opt/eprints3/perl_lib/EPrints/Database.pm line 2073.
>
>
> I have also converted all my data to utf8 (and I am sure they are
> correct in the database).
> But Eprints will start to complain that there is a wrong encoding
>
> I can fix it
> $self->do('SET NAMES utf8');
>
> in the Database.pm, when instances is created. And everything is fine.
>
> But this should not be necessary (?) Am I missing something? Or are
> all the archives of EPrints storing utf8 as latin1 internally in the
> databases? (and as somebody reported, proper sorting does not work).
> Shall I install new version of EPrints? Please give me some reasonable
> answers, it can't be EPrints, it must be me...
>
> Thanks,
>
>
> roman
>
>
> here is the convert how-to, I will eventually put it in the wiki (it
> depends on your answers)
>
>
> #dump schema of the database
> mysqldump --no-data --set-charset -u root -p<password> 
<db_name> > schema.sql
>
> #dump the data, it will be actually utf8 encoded, don't be fooled be
> the charset latin1 bit
> mysqldump --no-create-info --skip-set-charset -u root 
-p<yourpassword>
> --default-character-set=latin1 <db_name> > data.sql
>
> #open the schema.sql in an editor and:
> - replace all occurences of CHARSET=latin1 for CHARSET=utf8
> - also change the dafault NULL charset for columns (see
> http://bugs.mysql.com/bug.php?id=23073)
> -- search for "varchar(255)" and replace "with varchar(255) 
CHARACTER SET utf8 "
>
> #set the utf encoding for the data
>
> in linux you can do: echo 'SET NAMES utf8;' | cat - data.sql > 
datautf.sql
>
> #now load the edited db schema  (this will recreate the database, AND
> DESTROY ALL THE DATA!!! - make sure you have them in datautf.sql)
> mysql <db_name> -u root -p < schema.sql
>
> #load the data
> mysql <db_name> -u root -p < datautf.sql
>
> ----
> now you are done - if you want to set the default encoding for the
> database, but thats useful only for newly created tables (and might be
> better to set charset globally, for the whole server) you can issue
> alter database <db_name> character set utf8 collate;
>
>   


[index] [prev] [next] [options] [help]