EPrints Technical Mailing List Archive

Message: #01604


< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First

[EP-tech] Re: ISI Citation Data Import Script


On Wed, 2013-02-20 at 12:30 +0000, Tian, Jia wrote:
> Dear Tim,
> 
> I have developed a SOAP client against the Lite Service based on the
>  CPAN module "SOAP::WSDL".
>  http://search.cpan.org/~mkutter/SOAP-WSDL-2.00.10/lib/SOAP/WSDL.pm
> 
> The main difference I see between premium and Lite is the data
>  encapsulation. The search results returned by Lite service are in
>  plain XML modes while the results back from premium is encapsulated in
>  CDATA. It is very painful to parse the CDATA though I succeeded. Also
>  there are two key metadata are missing in the new version Lite Search
>  compared with the old WoS service (as I understand there was no
>  Premium service before):

Here's a sample from premium:
http://users.ecs.soton.ac.uk/tdb2/eprints/records.xml
note: I've stripped r_id_disclaimer (ISI repeat it, causing broken XML)
and pretty-printed

These are embedded in the SOAP response as escaped text (not CDATA
sections).

> 1, The record type is missing in Lite service. So all the records are
>  imported as "Article". Our editors need to sort them out by hand once
>  records are imported.  
>
> 2, Author's email addresses are missing in Lite service. So editors
>  need to add them by hand.

It doesn't look like premium gives this either. I've done a search for
field "email_addr" (which restricts to records containing that) and the
record itself doesn't contain an email address.

> Of course, there are more metadata provided in the premium service,
>  such as physical addresses of authors, sponsors of projects, abstract,
>  etc. However, our repository is not very interested in those. We are
>  now fighting with WoS about the record type metadata as it actually
>  downgraded the service level as we had before.

Hope that helps.

/Tim.

Attachment: signature.asc
Description: This is a digitally signed message part