[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[EP-tech] Re: UTF-8 issues on BibTeX import?

The truth is, I?m not sure where this should really go - the issue seems
to be in the standard BibTeX importer in perl_lib so ideally I?d like to
extend this to sanitise these kinds of characters out of the data.

On 10/03/2014 15:45, "Ian Stuart" <Ian.Stuart at ed.ac.uk> wrote:

>Reading strings?
>Have you tried
>   $count = utf8::upgrade($name)
>see http://perldoc.perl.org/utf8.html
>(I tried all sorts of things over the years... and I don't think I've
>been consistent)
>On 10/03/14 15:31, Andrew Beeken wrote:
>> Interesting!
>> Looking into this a bit further, the issue seems to be around the keys
>> that records take with them out of, say, a Scopus export. For example, a
>> record may be given a key of P?ron20141; note the accent - this is the
>> part that?s causing the issue and is probably understandable if the key
>> conforming to specific standards. With this in mind, is there a
>> On 10/03/2014 11:24, "Ian Stuart" <Ian.Stuart at ed.ac.uk> wrote:
>>> On 10/03/14 11:02, Andrew Beeken wrote:
>>>> Me again!
>>>> Another issue that has been flagged up by our admin users is that a
>>>> BibTeX import will fall over when it encounters accented characters
>>>> in an author name. I?ve already flagged a problem with UTF-8 encoding
>>>> in output in another email and I?m wondering if there is a similar
>>>> fix here?
>>> Something to consider (I fell over this) is that web servers have a
>>> tendency to not actually sent UTF-8, even when you ask them to....
>>> I have a script that wouldn't render the name of some Dutch university
>>> correctly..... but when I added in the name of a chinese one, it was
>>> It was a blinkin' NIGHTMARE to figure out.... and in the end I bypassed
>>> the EPrints output, and just "printed" directly, with the line
>>>     binmode(STDOUT, ":utf8");
>>> in my code.
>Ian Stuart.
>Developer: ORI, RJ-Broker, and OpenDepot.org
>Bibliographics and Multimedia Service Delivery team,
>The University of Edinburgh.
>This email was sent via the University of Edinburgh.
>The University of Edinburgh is a charitable body, registered in
>Scotland, with registration number SC005336.
>*** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
>*** Archive: http://www.eprints.org/tech.php/
>*** EPrints community wiki: http://wiki.eprints.org/
>*** EPrints developers Forum: http://forum.eprints.org/

The University of Lincoln, located in the heart of the city of Lincoln, has established an international reputation based on high student satisfaction, excellent graduate employment and world-class research.

The information in this e-mail and any attachments may be confidential. If you have received this email in error please notify the sender immediately and remove it from your system. Do not disclose the contents to another person or take copies.

Email is not secure and may contain viruses. The University of Lincoln makes every effort to ensure email is sent without viruses, but cannot guarantee this and recommends recipients take appropriate precautions.

The University may monitor email traffic data and content in accordance with its policies and English law. Further information can be found at: http://www.lincoln.ac.uk/legal.