[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[EP-tech] Antwort: Re: Antwort: Antwort: Re: Antwort: Re: fail to import PubMedID



Hi Justin,

it looks like your endpoint is returning HTML (entity-escaped  XML embedded
in a <pre> tag). And the DOCTYPE is HTML.

Best regards,

Martin

--
Dr. Martin Br?ndle
Zentrale Informatik
Universit?t Z?rich
Stampfenbachstr. 73
CH-8006 Z?rich

mail: martin.braendle at id.uzh.ch
phone: +41 44 63 56705
fax: +41 44 63 54505
http://www.zi.uzh.ch



Von:	Justin Bradley <jb4 at ecs.soton.ac.uk>
An:	eprints-tech at ecs.soton.ac.uk
Datum:	08/11/2016 13:03
Betreff:	Re: [EP-tech] Antwort: Antwort: Re: Antwort: Re: fail to import
            PubMedID
Gesendet von:	eprints-tech-bounces at ecs.soton.ac.uk



Thanks Martin.
I was just starting to look into this too.  But I?ll look to use yours
instead.

Just to double check while we are looking.  Should we still be using the
same end point, or should we move over to something more like:
https://www.ncbi.nlm.nih.gov/pubmed/?term=(26686599
[PMID])&report=xml&format=xml

Regards,
Justin

      On 8 Nov 2016, at 11:52, martin.braendle at id.uzh.ch wrote:



      I have published our version of the PubMedID Import plugin to

      https://github.com/eprintsug/PubMedID-Import

      It has been updated to cope with the https protocol that NCBI uses
      and also contains some code that does a duplicate check in the
      EPrints repo. See also attached phrases files (English and German).

      Feel free to use from this code whatever you think is useful for your
      implementation.

      Best regards,

      Martin

      --
      Dr. Martin Br?ndle
      Zentrale Informatik
      Universit?t Z?rich
      Stampfenbachstr. 73
      CH-8006 Z?rich

      mail: martin.braendle at id.uzh.ch
      phone: +41 44 63 56705
      fax: +41 44 63 54505
      http://www.zi.uzh.ch

      <graycol.gif>jens.vieler---07/11/2016 16:05:41---...i think, it is
      more general if XML::LibXML can't deal with https. So it's here:
      perl_lib/EPrints/

      Von: jens.vieler at id.uzh.ch
      An: eprints-tech at ecs.soton.ac.uk
      Datum: 07/11/2016 16:05
      Betreff: [EP-tech] Antwort: Re:  Antwort: Re:  fail to import
      PubMedID
      Gesendet von: eprints-tech-bounces at ecs.soton.ac.uk





      ...i think, it is more general if XML::LibXML can't deal with https.
      So it's here: perl_lib/EPrints/XML/LibXML.pm (Line 69) and
      'XML::LibXML->new();' is the wrong parser for our needs.

      What would you suggest? Changing Import/PubMedID.pm and
      bin/metadata_update from anything like

      EPrints::XML::parse_url( $url );

      to something like

      - using LWP to retrieve it
      - then LibXML to decode it to xml

      or create a more general and new EPrints::XML module?

      Workarounds or other quick & dirtys are also welcome

      Jens



      --
      Jens Vieler
      Zentrale Informatik
      Universit?t Z?rich
      Stampfenbachstrasse 73
      CH-8006 Z?rich

      mail:  jens.vieler at id.uzh.ch
      phone: +41 44 63 56777
      http://www.id.uzh.ch

      <graycol.gif>Adam Field ---07.11.2016 14:39:46---?.on, incidentally,
      it?s this line:
      https://github.com/eprints/eprints/blob/3.3/perl_lib/EPrints/Plu

      Von: Adam Field <Adam.Field at jisc.ac.uk>
      An: "eprints-tech at ecs.soton.ac.uk" <eprints-tech at ecs.soton.ac.uk>
      Datum: 07.11.2016 14:39
      Betreff: Re: [EP-tech] Antwort: Re:  fail to import PubMedID
      Gesendet von: eprints-tech-bounces at ecs.soton.ac.uk



      ?.on, incidentally, it?s this line:

      https://github.com/eprints/eprints/blob/3.3/perl_lib/EPrints/Plugin/Import/PubMedID.pm#L58





|-----------------------------|
|      <35553429.gif>         |
|-----------------------------|
|      Adam Field             |
|      SHERPA services analyst|
|      developer              |
|-----------------------------|




      From: Adam Field <Adam.Field at jisc.ac.uk>
      Date: Monday, 7 November 2016 13:32
      To: "eprints-tech at ecs.soton.ac.uk" <eprints-tech at ecs.soton.ac.uk>
      Subject: Re: [EP-tech] Antwort: Re: fail to import PubMedID

      I can confirm this ? I can also download the metadata via https using
      curl.

      Jens? suggestions are good.  We should be able to respond to this
      kind of thing as a community ? it?s a non-core, simple bug.  I?m
      happy to offer advice, code review and testing if anyone wants to
      give it a stab.  Alternatively, is there anyone out there who can
      offer me the same if I take a stab?

      Best



|-----------------------------|
|      <35669103.gif>         |
|-----------------------------|
|      Adam Field             |
|      SHERPA services analyst|
|      developer              |
|-----------------------------|




      From: <eprints-tech-bounces at ecs.soton.ac.uk> on behalf of "
      jens.vieler at id.uzh.ch" <jens.vieler at id.uzh.ch>
      Reply-To: "eprints-tech at ecs.soton.ac.uk" <
      eprints-tech at ecs.soton.ac.uk>
      Date: Monday, 7 November 2016 10:45
      To: "eprints-tech at ecs.soton.ac.uk" <eprints-tech at ecs.soton.ac.uk>
      Subject: [EP-tech] Antwort: Re: fail to import PubMedID



      Dear Adam, Hiroshi, List

      Watching the same since this morning #-) ...they changed to https
      this weekend.

      wget'ing https works fine, but we canot simply change the protocol in
      our script, because it seems LibXML can't handle it. So what about
      getting the https from out of the script and change parse_url into
      parse_file on that local file. Or change to LWP::Protocol::https?

      Jens


      --
      Jens Vieler
      Zentrale Informatik
      Universit?t Z?rich
      Stampfenbachstrasse 73
      CH-8006 Z?rich

      mail:  jens.vieler at id.uzh.ch
      phone: +41 44 63 56777
      http://www.id.uzh.ch

      <35252086.gif>Adam Field ---07.11.2016 11:30:30---Visiting the URL, I
      get: <eFetchResult>

      Von: Adam Field <Adam.Field at jisc.ac.uk>
      An: "eprints-tech at ecs.soton.ac.uk" <eprints-tech at ecs.soton.ac.uk>
      Datum: 07.11.2016 11:30
      Betreff: Re: [EP-tech] fail to import PubMedID
      Gesendet von: eprints-tech-bounces at ecs.soton.ac.uk






      Visiting the URL, I get:

      <eFetchResult>
      <ERROR>WebEnv parameter is required</ERROR>
      </eFetchResult>

      If I add a dummy WebEnb parameter, I get:

      <eFetchResult>
      <ERROR>query_key parameter is required</ERROR>
      </eFetchResult>

      ?it looks like the API the plugin is using has changed L  It?s
      unlikely to be a local problem.




|-----------------------|
|      <35386694.gif>   |
|-----------------------|
|      Adam Field       |
|      SHERPA services  |
|      analyst developer|
|-----------------------|




      From: <eprints-tech-bounces at ecs.soton.ac.uk> on behalf of Hiroshi
      Watabe <hwatabe at m.tohoku.ac.jp>
      Organization: CYRIC
      Reply-To: "eprints-tech at ecs.soton.ac.uk" <
      eprints-tech at ecs.soton.ac.uk>
      Date: Monday, 7 November 2016 01:27
      To: "eprints-tech at ecs.soton.ac.uk" <eprints-tech at ecs.soton.ac.uk>
      Subject: [EP-tech] fail to import PubMedID

      Dear all,

      It seems PubMed only accepts https now and I cannot import PubMed ID
      anymore. I got the following warning message.
      Unhandled warning in Import::PubMedID: http error : Unknown IO error

      I modified PubMedID.pm as follows but no success.
      27c27
      <       $self->{EFETCH_URL} =
      '
      http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&retmode=xml&rettype=full
      ';
      ---


                   $self->{EFETCH_URL} =
            '
            https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&retmode=xml&rettype=full
            ';

      Error message is as follows;
      Unhandled exception in Import::PubMedID: Could not create file parser
      context for file

      Could you help me?

      Hiroshi
      *** Options:
      http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
      *** Archive: http://www.eprints.org/tech.php/
      *** EPrints community wiki: http://wiki.eprints.org/
      *** EPrints developers Forum: http://forum.eprints.org/





      Jisc is a registered charity (number 1149740) and a company limited
      by guarantee which is registered in England under Company No.
      5747339, VAT No. GB 197 0632 86. Jisc?s registered office is: One
      Castlepark, Tower Hill, Bristol, BS2 0JA. T 0203 697 5800.

      Jisc Services Limited is a wholly owned Jisc subsidiary and a company
      limited by guarantee which is registered in England under company
      number 2881024, VAT number GB 197 0632 86. The registered office is:
      One Castle Park, Tower Hill, Bristol BS2 0JA. T 0203 697 5800. ***
      Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
      *** Archive: http://www.eprints.org/tech.php/
      *** EPrints community wiki: http://wiki.eprints.org/
      *** EPrints developers Forum: http://forum.eprints.org/*** Options:
      http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
      *** Archive: http://www.eprints.org/tech.php/
      *** EPrints community wiki: http://wiki.eprints.org/
      *** EPrints developers Forum: http://forum.eprints.org/*** Options:
      http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
      *** Archive: http://www.eprints.org/tech.php/
      *** EPrints community wiki: http://wiki.eprints.org/
      *** EPrints developers Forum: http://forum.eprints.org/




      *** Options:
      http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
      *** Archive: http://www.eprints.org/tech.php/
      *** EPrints community wiki: http://wiki.eprints.org/
      *** EPrints developers Forum: http://forum.eprints.org/

--
Justin Bradley
Strategy & Technical Lead
EPrints Services
University of Southampton
*** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
*** Archive: http://www.eprints.org/tech.php/
*** EPrints community wiki: http://wiki.eprints.org/
*** EPrints developers Forum: http://forum.eprints.org/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ecs.soton.ac.uk/pipermail/eprints-tech/attachments/20161108/77b64f5c/attachment-0001.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
Url : http://mailman.ecs.soton.ac.uk/pipermail/eprints-tech/attachments/20161108/77b64f5c/attachment-0001.gif