EPrints Technical Mailing List Archive

Message: #00270


< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First

[EP-tech] Problems with characters


Hi

 

We have a number of problems with swedish characters in our eprints installation (3.3.8 ubuntu 10.04) that we think are related. We have exported the information as xml from an 3.3.7 installation and imported it in 3.3.8 with

eprint commands. All files and the database in the installation are based on utf-8.

We have made a temporary fix in the query.pm in the /perl_lib/URI directory that solves some problems but not all.

sub query_form { ...

#                             $val =~ s/([;\/?:@&=+,\$%])/$URI::Escape::escapes{$1}/g;

                                $val = URI::Escape::uri_escape_utf8($val);

                                ...

                                }

The problems:

 

- If I want to subscribe to a feed that has a Swedish character in the url,  i get internal server error(the error log says:

[error] Malformed UTF-8 character (fatal) at /usr/share/eprints3/perl_lib/EPrints/Utils.pm line 315.\n) .

If i change the url encoding of the Swedish character with the Swedish character on my keyboard (in this case from %F6 to ö) the feed works alright. 

 

- If i want to refine a search for creator in the advanced search which use swedish characters i get internal server error. (the error log says:

[error] Malformed UTF-8 character (fatal) at /usr/share/eprints3/perl_lib/EPrints/XHTML.pm line 333.\n) .

 

- If i search for creators in the advanced search whose name includes the swedish upercase letter Å i get 0 hits, but i knowe that there are several names which begins with the upercase letter Å. The other swedish characters works fine.

 

- In the section part of the view creators the swedish characters should be sorted at the end but they are sorted after the letter A (http://pub.epsilon.slu.se/view/creators/).

In the author section of the view year the swedish characters are sorted correctly (http://pub.epsilon.slu.se/view/year/2004.html).

 

Is seems that non english characters, in our case swedish, sometimes works fine but somtimes not? Is this a known issue?

 

Regards

 

Carl Johan Syrén

Swedish University of Agricultural Sciences