Tech List

[index] [prev] [next] [options] [help]
See the Mailing Lists Page for how to subscribe and unsubscribe.

eprints_tech messages

Please note: this page shows emails that have been sent to the eprints_tech mailing list. Some of these may be spam emails we have failed to filter.

[EP-tech] Re: EPrints and attributes in XML

From: "Helge Knuettel" <Helge.Knuettel AT bibliothek.uni-regensburg.de>
Date: Tue, 11 Nov 2008 13:48:19 +0100


Threading: [EP-tech] EPrints and attributes in XML from Helge.Knuettel AT bibliothek.uni-regensburg.de
      • This Message
             [EP-tech] Re: EPrints and attributes in XML from tdb01r AT ecs.soton.ac.uk

http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
*** EPrints community wiki - http://wiki.eprints.org/
Hi Tim,

sorry for the delayed answer.

In PubMedXML.pm I added the following code to xml_to_epdata


	# DOI
	my $pubmeddata =
$xml->getElementsByTagName("PubmedData")->item(0);
	if ( defined $pubmeddata )
	{
		my $articleidlist =
$pubmeddata->getElementsByTagName("ArticleIdList")->item(0);
		if ( defined $articleidlist )
		{
			foreach my $articleid (
$articleidlist->getElementsByTagName("ArticleId") )
			{
				if ( defined $articleid )
				{
					if ( $articleid->getAttribute(
"IdType" ) eq "doi" )
					{
						my $doi = {};
						$doi->{type} = "doi";
						$doi->{name} =
$plugin->xml_to_text( $articleid );
						push  AT {
$epdata->{id_number} }, $doi;
					 }
				}
			}
		}
	}


The code trying to access attributes never returmed a result.
Therefore, I looked at the XML arriving in xml_to_epdata:

sub xml_to_epdata
{
	# $xml is the PubmedArticle element
	my( $plugin, $dataset, $xml ) =  AT _;
	
	# For debugging: Check what XML is arriving here:
	print STDERR $xml->toString();
	...

I found that no attributes were present in the output. Ok, this might
be due to rendering by toString, too.

Currently I am using the following workaround:



	# DOI
	my $pubmeddata =
$xml->getElementsByTagName("PubmedData")->item(0);
	if ( defined $pubmeddata )
	{
		my $articleidlist =
$pubmeddata->getElementsByTagName("ArticleIdList")->item(0);
		if ( defined $articleidlist )
		{
			foreach my $articleid (
$articleidlist->getElementsByTagName("ArticleId") )
			{
				if ( defined $articleid )
				{
					# So far no attributes are
available in the XML document at this stage.
					# They were probaly lost when
parsing the document.
					# Therefore we are using a
workaround and use any ArticleID as a DOI
					# when it starts with "10.".
This is not always true!!!!
					my $value =
$plugin->xml_to_text( $articleid );
					if ( $value =~ m/^10\./ )
					{
						push  AT {
$epdata->{id_number} }, { 'type' => 'doi', 'name' => $value };
					}
			
				}
			}
		}
	}

Helge




-- 

----
Dr. rer. nat. Helge Knüttel
Fachreferat Medizin, Informationsvermittlung Medizin
Universitätsbibliothek Regensburg
D-93042 Regensburg, Germany
phone: ++49 941 944-5937; fax: ++49 941 944-5938
email: helge.knuettel AT bibliothek.uni-regensburg.de
WWW: http://www.bibliothek.uni-regensburg.de/tb/medizin/start.htm
-----------------------------------------------------------
>>> Tim Brody <tdb01r AT ecs.soton.ac.uk> schrieb am 29.10.2008 
um 17:29
in Nachricht
<49088F53.9000200 AT ecs.soton.ac.uk>:
> *** http://www.eprints.org/tech.php/id/%(ID)s 
> *** EPrints community wiki - http://wiki.eprints.org/ 
> Helge Knuettel wrote:
>> *** http://www.eprints.org/tech.php/id/%(ID)s 
>> *** EPrints community wiki - http://wiki.eprints.org/ 
>> Hi,
>>
>> To me it looks as if EPrints ignores attributes in XML. Is it
possible
>> to change that without major work?
>>
>> Background: I am enhancing the PubMed import plugin for our
purposes.
>> DOI import would be nice. In PubMed's XML the type of an ID (e.g.
>> PubMed-ID, DOI and several others) is coded in an attribute.
However,
>> when dealing with the GDOME XML document in
>> EPrints::Plugin::Import::PubMedXML no attributes are there. I
assume
>> they were lost when parsing the XML string.
>>   
> Could you give me a code example? I don't think anything should be 
> stripping XML attributes.
> 
> Tim.
> 
> FOOTER (%(LIST)s)



[index] [prev] [next] [options] [help]