EPrints Technical Mailing List Archive

See the EPrints wiki for instructions on how to join this mailing list and related information.

Message: #10223


< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First

Re: [EP-tech] Import from DOI (via CrossRef)


CAUTION: This e-mail originated outside the University of Southampton.
I realised the conversation with David had dropped off the list, so for everyone’s future reference:

There was an issue with our codebase in the LibXML blob David shared previously which I corrected, but it didn’t entirely solve our issue.

It appears that our infrastructure setup means that LWP calls need to refer through our web proxy.  I have made this work for us by coding in our proxy details (so we can get on with importing some items) but ideally it would pull in the proxy details from the config.  Once I get that figured out in good form, I’ll submit the changes for consideration to the main prints Git.

Alan

From: David R Newman <drn@ecs.soton.ac.uk>
Date: Wednesday, 27 August 2025 at 10:38
To: eprints-tech@ecs.soton.ac.uk <eprints-tech@ecs.soton.ac.uk>, Alan.Stiles [He/Him/They] <alan.stiles@open.ac.uk>
Subject: Re: [EP-tech] Import from DOI (via CrossRef)

Hi Alan,

I think a couple of other people have picked up on this recently it could be one of two issues.

1. The base_url for Import::DOI is still http rather than https, this was fixed before 3.4.5 was released

2. EPrints::XML::LibXML::_parse_url does not check if the URL is https and use LWP::Simple to read the file in first before parsing it as a string rather than directly as a file.  This was also fixed before 3.4.5 was released but looking at your debug, EPrints still seems to be calling XML::LibXML::parse_file rather than XML::LibXML::parse_string, which I would expect with a https URL.

Please check too see whether the following code snippet for perl_lib/EPrints/XML/LibXML.pm or any version of this file that overrides the one in perl_lib has a _parse_url function that looks like the following:

https://github.com/eprints/eprints3.4/blob/e08862948a02c4c3fde5acba50af1976b0031d5f/perl_lib/EPrints/XML/LibXML.pm#L104L113

If it does it may be worth adding some debug to the start of this function to print the URL out to the error log so that you can see whether an http or https URL is being sent to be parsed.

Regards

David Newman

P.S. I assume that the "XXXXXX" is just you hiding the pid?

On 27/08/2025 10:25, Alan.Stiles [He/Him/They] wrote:
CAUTION: This e-mail originated outside the University of Southampton.
CAUTION: This e-mail originated outside the University of Southampton.
Hi all,
Import from DOI (via crossref) appears to have stopped working for us since about the end of July. (Customised eprints 3.4.5 on RHEL 9)
I’ve tried amending the target URL to use https and when I try to import a new DOI I just get

Unhandled warning in Import::DOI: http error : Unknown IO error

Chucking some debug in on Dev gets me the following error (edited for pid details, bold stuff between '§§’ marks is my debug tracing, assuming that comes through on the mailing list).

Unhandled warning in Import::DOI: §§EPrints::XML::LibXML::_parse_url pub_lib/Plug↲
ins/Import/DOI input_text_fh:§§Could not create file parser context for file "htt↲
ps://doi.crossref.org/openurl?noredirect=true&pid=XXXXXX&format=unixref&↲
id=10.4324%2F9781003333999-2": No such file or directory at /usr/lib64/perl5/ve↲
ndor_perl/XML/LibXML.pm line 938, <$fh> line 2. XML::LibXML::parse_file(XML::Li↲
d=XXXXXX"...) called at /opt/eprints/perl_lib/EPrints/XML/LibXML.pm line 110 ↲
EPrints::XML::_parse_url(URI::https=SCALAR(0x55a6f53b96b0)) called at /opt/epri↲
nts/perl_lib/EPrints/XML.pm line 158 EPrints::XML::parse_url() called at /opt/e↲
prints/flavours/pub_lib/plugins/EPrints/Plugin/Import/DOI.pm line 109 eval {...↲
} called at /opt/eprints/flavours/pub_lib/plugins/EPrints/Plugin/Import/DOI.pm ↲
line 108 EPrints::Plugin::Import::DOI::input_text_fh(EPrints::Plugin::Import::O↲
rcidDOI=HASH(0x55a6f52bbf78), "fh", File::Temp=GLOB(0x55a6f52eef20), "action ...


It looks like it’s trying to use the LibXML file parser but can’t deal with the uri for some reason?
Any clues? 
Thanks
Alan