EPrints Technical Mailing List Archive

Message: #05103


< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First

[EP-tech] Re: RIS plugin problems (utf8 and journal title)




On 13/11/15 13:29, George Mamalakis wrote:
OK! Which means that it is only available for the roman alphabet? That's
odd!

From the documentation:

The characters allowed in the reference ID fields can be in the set "0" through "9," or "A" through "Z." The characters allowed in all other fields can be in the set from "space" (character 32) to character 255 in the ANSI Character Set. Note, however, that the asterisk (character 42) is not allowed in the author, keywords or periodical name fields. "

Also note:

Each tag and its contents must be on a separate line, preceded by a carriage return/line feedî (ANSI 13 10).



Anyway, since most sites allow RIS exports for utf8 encoded characters,
it wouldn't harm if the import plugin supported them as well.

Thanks for the info though!

On 13/11/2015 02:13 μμ, Ian Stuart wrote:
RIS format is, by specification, ASCII not UTF8


On 13/11/15 10:12, George Mamalakis wrote:
Hello everybody,

I tried to use the RIS import plugin from:
http://files.eprints.org/741/. The plugin wouldn't accept the
publication field from Google scholar exported entries, nor would it
allow UTF8 encoded strings to be imported (both problems have been
spotted from the web import functionality). So, I tried to resolve them
myself, and I found the following corrections that seem to solve the
problems.


diff -r d5f969263300 perl_lib/EPrints/Plugin/Import/RIS.pm
--- a/perl_lib/EPrints/Plugin/Import/RIS.pm     Fri Nov 06 11:22:06 2015
+0200
+++ b/perl_lib/EPrints/Plugin/Import/RIS.pm     Fri Nov 13 11:57:54 2015
+0200
@@ -34,7 +34,6 @@
         my( $plugin, %opts ) = @_;
         my @ids;
         my $fh = $opts{fh}; # File handle
+  binmode( $fh, ":utf8" );
         my @file = <$fh>;
         my ( %record, @records ) = ();
         my $lastkey = undef;
@@ -237,9 +236,6 @@
         # Publication title
         &_join_multiple_field_data($epdata, $entry, ['T2', 'JF'],
'publication', ', ');
+  &_join_multiple_field_data($epdata, $entry, ['T2', 'JO'],
'publication', ', ');
         # Series title
         &_join_field_data($epdata, $entry, 'T3', 'series', ', ');

What I've done was to change the binmode of the file (borrowed from
BibTeX import plugin) to accept utf8 encoded strings, and I've added one
more entry for the publication field (journal title if I'm not mistaken)
to be based on JO rather than JF (which is how scholar returns it).

I am sending these changes to:

a) help anyone having the same problems with the specific plugin,
b) ask if these corrections are correct :), and
c) also to ask what is the proper procedure of reporting these "bugs" so
they'll be corrected permanently (eg. contact the maintainer directly,
indirectly, what?).

Thanks all for your answers in advance,

George.





--

Ian Stuart.
Developer: ORI, RJ-Broker, and OpenDepot.org
Bibliographics and Multimedia Service Delivery team,
EDINA,
The University of Edinburgh.

http://edina.ac.uk/

This email was sent via the University of Edinburgh.

The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.