See the Mailing Lists Page for how to subscribe and unsubscribe.
eprints_tech messages
Please note: this page shows emails that have been sent to the eprints_tech mailing list. Some of these may be spam emails we have failed to filter.
[EP-tech] either me or Eprints is missing on utf8 - bug/feature request
From: "Roman Chyla" <roman.chyla AT gmail.com>
Date: Sun, 11 May 2008 00:41:59 +0200
| Threading: | • This Message → Re: [EP-tech] either me or Eprints is missing on utf8 - bug/feature request from tdb01r AT ecs.soton.ac.uk → Re: [EP-tech] either me or Eprints is missing on utf8 - bug/feature request from roman.chyla AT gmail.com → Re: [EP-tech] either me or Eprints is missing on utf8 - bug/feature request from roman.chyla AT gmail.com |
*** ↵ http://www.eprints.org/tech.php/id/%3Cea0115e90805101541v542d2c1o3ad4f731703bfba9%40mail.gmail.com%3E *** EPrints community wiki - http://wiki.eprints.org/ Hello, Excuse my premature senility, some things (serious ones) are not clear to me. I have successfully converted my database to utf8, fighting with several issues and found (possibly) a bug Firstly, one cannot have default collation set for the database like this: Alter database eprints3 character set utf8 collate utf8_czech_ci; because this will happen DBD::mysql::st execute failed: Illegal mix of collations (utf8_general_ci,IMPLICIT) and (utf8_czech_ci,IMPLICIT) for operation '=' at /opt/eprints3/perl_lib/EPrints/Database.pm line 2363. SQL ERROR (execute): SELECT M.subjectid, M.pos, M.ancestors, C.pos FROM cache5960 AS C, subject_ancestors AS M WHERE M.subjectid = C.subjectid AND C.pos>0 ORDER BY C.pos SQL ERROR (execute): Illegal mix of collations (utf8_general_ci,IMPLICIT) and (utf8_czech_ci,IMPLICIT) for operation '=' DBD::mysql::st fetchrow_array failed: fetch() without execute() at /opt/eprints3/perl_lib/EPrints/Database.pm line 2073. I have also converted all my data to utf8 (and I am sure they are correct in the database). But Eprints will start to complain that there is a wrong encoding I can fix it $self->do('SET NAMES utf8'); in the Database.pm, when instances is created. And everything is fine. But this should not be necessary (?) Am I missing something? Or are all the archives of EPrints storing utf8 as latin1 internally in the databases? (and as somebody reported, proper sorting does not work). Shall I install new version of EPrints? Please give me some reasonable answers, it can't be EPrints, it must be me... Thanks, roman here is the convert how-to, I will eventually put it in the wiki (it depends on your answers) #dump schema of the database mysqldump --no-data --set-charset -u root -p<password> <db_name> ↵ > schema.sql #dump the data, it will be actually utf8 encoded, don't be fooled be the charset latin1 bit mysqldump --no-create-info --skip-set-charset -u root -p<yourpassword> --default-character-set=latin1 <db_name> > data.sql #open the schema.sql in an editor and: - replace all occurences of CHARSET=latin1 for CHARSET=utf8 - also change the dafault NULL charset for columns (see http://bugs.mysql.com/bug.php?id=23073) -- search for "varchar(255)" and replace "with varchar(255) ↵ CHARACTER SET utf8 " #set the utf encoding for the data in linux you can do: echo 'SET NAMES utf8;' | cat - data.sql > datautf.sql #now load the edited db schema (this will recreate the database, AND DESTROY ALL THE DATA!!! - make sure you have them in datautf.sql) mysql <db_name> -u root -p < schema.sql #load the data mysql <db_name> -u root -p < datautf.sql ---- now you are done - if you want to set the default encoding for the database, but thats useful only for newly created tables (and might be better to set charset globally, for the whole server) you can issue alter database <db_name> character set utf8 collate;
[index] [prev] [next] [options] [help]





