Tech List

[index] [prev] [next] [options] [help]
See the Mailing Lists Page for how to subscribe and unsubscribe.

eprints_tech messages

Please note: this page shows emails that have been sent to the eprints_tech mailing list. Some of these may be spam emails we have failed to filter.

[EP-tech] either me or Eprints is missing on utf8 - bug/feature request

From: "Roman Chyla" <roman.chyla AT gmail.com>
Date: Sun, 11 May 2008 00:41:59 +0200


Threading:      • This Message
             Re: [EP-tech] either me or Eprints is missing on utf8 - bug/feature request from tdb01r AT ecs.soton.ac.uk
             Re: [EP-tech] either me or Eprints is missing on utf8 - bug/feature request from roman.chyla AT gmail.com
             Re: [EP-tech] either me or Eprints is missing on utf8 - bug/feature request from roman.chyla AT gmail.com

*** 
http://www.eprints.org/tech.php/id/%3Cea0115e90805101541v542d2c1o3ad4f731703bfba9%40mail.gmail.com%3E
*** EPrints community wiki - http://wiki.eprints.org/

Hello,
Excuse my premature senility, some things (serious ones) are not clear
to me. I have successfully converted my database to utf8, fighting
with several issues and found (possibly) a bug

Firstly, one cannot have default collation set for the database like this:
Alter database eprints3 character set utf8 collate utf8_czech_ci;

because this will happen
DBD::mysql::st execute failed: Illegal mix of collations
(utf8_general_ci,IMPLICIT) and (utf8_czech_ci,IMPLICIT) for operation
'=' at /opt/eprints3/perl_lib/EPrints/Database.pm line 2363.
SQL ERROR (execute): SELECT M.subjectid, M.pos, M.ancestors, C.pos
FROM cache5960 AS C, subject_ancestors AS M WHERE M.subjectid =
C.subjectid AND C.pos>0 ORDER BY C.pos
SQL ERROR (execute): Illegal mix of collations
(utf8_general_ci,IMPLICIT) and (utf8_czech_ci,IMPLICIT) for operation
'='
DBD::mysql::st fetchrow_array failed: fetch() without execute() at
/opt/eprints3/perl_lib/EPrints/Database.pm line 2073.


I have also converted all my data to utf8 (and I am sure they are
correct in the database).
But Eprints will start to complain that there is a wrong encoding

I can fix it
$self->do('SET NAMES utf8');

in the Database.pm, when instances is created. And everything is fine.

But this should not be necessary (?) Am I missing something? Or are
all the archives of EPrints storing utf8 as latin1 internally in the
databases? (and as somebody reported, proper sorting does not work).
Shall I install new version of EPrints? Please give me some reasonable
answers, it can't be EPrints, it must be me...

Thanks,


roman


here is the convert how-to, I will eventually put it in the wiki (it
depends on your answers)


#dump schema of the database
mysqldump --no-data --set-charset -u root -p<password> <db_name> 
> schema.sql

#dump the data, it will be actually utf8 encoded, don't be fooled be
the charset latin1 bit
mysqldump --no-create-info --skip-set-charset -u root -p<yourpassword>
--default-character-set=latin1 <db_name> > data.sql

#open the schema.sql in an editor and:
- replace all occurences of CHARSET=latin1 for CHARSET=utf8
- also change the dafault NULL charset for columns (see
http://bugs.mysql.com/bug.php?id=23073)
-- search for "varchar(255)" and replace "with varchar(255) 
CHARACTER SET utf8 "

#set the utf encoding for the data

in linux you can do: echo 'SET NAMES utf8;' | cat - data.sql > datautf.sql

#now load the edited db schema  (this will recreate the database, AND
DESTROY ALL THE DATA!!! - make sure you have them in datautf.sql)
mysql <db_name> -u root -p < schema.sql

#load the data
mysql <db_name> -u root -p < datautf.sql

----
now you are done - if you want to set the default encoding for the
database, but thats useful only for newly created tables (and might be
better to set charset globally, for the whole server) you can issue
alter database <db_name> character set utf8 collate;


[index] [prev] [next] [options] [help]