See the Mailing Lists Page for how to subscribe and unsubscribe.
eprints_tech messages
Please note: this page shows emails that have been sent to the eprints_tech mailing list. Some of these may be spam emails we have failed to filter.
Re: [EP-tech] either me or Eprints is missing on utf8 - bug/feature request
From: Tim Brody <tdb01r AT ecs.soton.ac.uk>
Date: Mon, 12 May 2008 10:26:21 +0100
| Threading: | ↑ [EP-tech] either me or Eprints is missing on utf8 - bug/feature request from roman.chyla AT gmail.com • This Message → Re: [EP-tech] either me or Eprints is missing on utf8 - bug/feature request from tdb01r AT ecs.soton.ac.uk |
*** http://www.eprints.org/tech.php/id/%3C48280D3D.5060900%40ecs.soton.ac.uk%3E *** EPrints community wiki - http://wiki.eprints.org/ EPrints doesn't expect the database to be in Unicode (or any other encoding). The theory is that if you want a sorting other than in English you will write a custom method for your language and use it in the "make_value_orderkey" property on the fields that aren't in English. This property is briefly documented at: http://wiki.eprints.org/w/Metadata All the best, Tim. Roman Chyla wrote: > *** ↵ http://www.eprints.org/tech.php/id/%3Cea0115e90805101541v542d2c1o3ad4f731703bfba9%40mail.gmail.com%3E > *** EPrints community wiki - http://wiki.eprints.org/ > > Hello, > Excuse my premature senility, some things (serious ones) are not clear > to me. I have successfully converted my database to utf8, fighting > with several issues and found (possibly) a bug > > Firstly, one cannot have default collation set for the database like this: > Alter database eprints3 character set utf8 collate utf8_czech_ci; > > because this will happen > DBD::mysql::st execute failed: Illegal mix of collations > (utf8_general_ci,IMPLICIT) and (utf8_czech_ci,IMPLICIT) for operation > '=' at /opt/eprints3/perl_lib/EPrints/Database.pm line 2363. > SQL ERROR (execute): SELECT M.subjectid, M.pos, M.ancestors, C.pos > FROM cache5960 AS C, subject_ancestors AS M WHERE M.subjectid = > C.subjectid AND C.pos>0 ORDER BY C.pos > SQL ERROR (execute): Illegal mix of collations > (utf8_general_ci,IMPLICIT) and (utf8_czech_ci,IMPLICIT) for operation > '=' > DBD::mysql::st fetchrow_array failed: fetch() without execute() at > /opt/eprints3/perl_lib/EPrints/Database.pm line 2073. > > > I have also converted all my data to utf8 (and I am sure they are > correct in the database). > But Eprints will start to complain that there is a wrong encoding > > I can fix it > $self->do('SET NAMES utf8'); > > in the Database.pm, when instances is created. And everything is fine. > > But this should not be necessary (?) Am I missing something? Or are > all the archives of EPrints storing utf8 as latin1 internally in the > databases? (and as somebody reported, proper sorting does not work). > Shall I install new version of EPrints? Please give me some reasonable > answers, it can't be EPrints, it must be me... > > Thanks, > > > roman > > > here is the convert how-to, I will eventually put it in the wiki (it > depends on your answers) > > > #dump schema of the database > mysqldump --no-data --set-charset -u root -p<password> ↵ <db_name> > schema.sql > > #dump the data, it will be actually utf8 encoded, don't be fooled be > the charset latin1 bit > mysqldump --no-create-info --skip-set-charset -u root ↵ -p<yourpassword> > --default-character-set=latin1 <db_name> > data.sql > > #open the schema.sql in an editor and: > - replace all occurences of CHARSET=latin1 for CHARSET=utf8 > - also change the dafault NULL charset for columns (see > http://bugs.mysql.com/bug.php?id=23073) > -- search for "varchar(255)" and replace "with varchar(255) ↵ CHARACTER SET utf8 " > > #set the utf encoding for the data > > in linux you can do: echo 'SET NAMES utf8;' | cat - data.sql > ↵ datautf.sql > > #now load the edited db schema (this will recreate the database, AND > DESTROY ALL THE DATA!!! - make sure you have them in datautf.sql) > mysql <db_name> -u root -p < schema.sql > > #load the data > mysql <db_name> -u root -p < datautf.sql > > ---- > now you are done - if you want to set the default encoding for the > database, but thats useful only for newly created tables (and might be > better to set charset globally, for the whole server) you can issue > alter database <db_name> character set utf8 collate; > >
[index] [prev] [next] [options] [help]





