[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[EP-tech] non working DataciteDoi plugin (Recollect installed)



Hi Yuri,

the two plugins work quite differently.

The FIXED eprints branch supports only DataCite Metadata Schema 2.2, see
https://github.com/eprints/datacite/blob/fixed/cfg/cfg.d/z_datacitedoi.pl
It has operates on a fix set of fields.

The EprintsUG plugin supports DataCite Metadata Schema 4.0 . It supports
any set of fields and must be adapted to a specific repo. The fields in a
repo are mapped to methods that must be specified in
lib/cfg.d/z_datacite_mapping.pl (
https://github.com/eprintsug/DataCiteDoi/blob/master/lib/cfg.d/z_datacite_mapping.pl
 )

This algorithm has its pros and cons:
- pro: very versatile and modular, the Export plugin itself must not be
modified. Instead a config file can be adapted.
- con: The loop over all fields is inefficient (
https://github.com/eprintsug/DataCiteDoi/blob/64956b6d4b461159ac2ae35df14105cff4a86171/lib/plugins/EPrints/Plugin/Export/DataCiteXML.pm#L62-L68
 ). For DataCite Metadata Schema, there are 19 elements. A large repo may
have 100 fields or more. 80% of the loop are just overhead. Imagine having
to spend 80% overhead for exporting a repo of 100'000 records.
- con: dependencies between fields  are not considered; workarounds in the
mapping methods must be implemented. This gets especially problematic if
either of two or more fields map to the same DataCite element, but the
field values have to be inserted in nested sub-elements.
- con: only eprint table fields are mapped, but there are several document
table fields (format, size, license) that must be used for DataCite
Metadata Schema 4.x, too, and need  additional calls within DataCiteXML.pm
- as you pointed out, there is no order of fields. While the description of
the DataCite Metadata Schema (
https://schema.datacite.org/meta/kernel-4.1/doc/DataCite-MetadataKernel_v4.1.pdf
 ) lists them in a specific order, the schema itself (
https://schema.datacite.org/meta/kernel-4.1/metadata.xsd ) requires no
specific order. However, for readability of the output, I would have
prefered the order as outlined in the PDF.
- there are no checks against mandatory fields

I have just recently implemented the EprintsUG DataCite XML export plugin
to our repository and also adapted to DataCite Metadata Schema 4.1. It
should cover all DataCite elements. We need it to produce SIPs for a
long-term archive project (DLCM) as well as parts of it for exporting
Funding data to OpenAire (Currently, funding information is not displayed
on the production system, but that will come soon).
Example output:
http://www.zora.uzh.ch/cgi/export/eprint/150598/DataCiteXML/zora-eprint-150598.xml

Not sure if my code will help - some parts are highly
repository-implementation specific (e.g. document types, language codes
(ISO639-3) etc., funder data model which is already DataCite compatible and
further).

Best regards,

Martin

--
Dr. Martin Br?ndle
Zentrale Informatik
Universit?t Z?rich
Stampfenbachstr. 73
CH-8006 Z?rich

mail: martin.braendle at id.uzh.ch
phone: +41 44 63 56705
fax: +41 44 63 54505
http://www.zi.uzh.ch



Von:	Yuri <yurj at alfa.it>
An:	<eprints-tech at ecs.soton.ac.uk>
Datum:	03.04.2018 11:06
Betreff:	Re: [EP-tech] non working DataciteDoi plugin (Recollect
            installed)
Gesendet von:	eprints-tech-bounces at ecs.soton.ac.uk



Hi!

 ?this version should works:

https://github.com/eprints/datacite/blob/fixed/lib/plugins/EPrints/Plugin/Export/DataCiteXML.pm

(note the FIXED name of the branch ;-) )

while this:

https://github.com/eprintsug/DataCiteDoi/blob/master/lib/plugins/EPrints/Plugin/Export/DataCiteXML.pm


lead to a malformed datacite (but aim to support all the field, maybe
they use exactly the datacite metadata??)

Please, can someone fix bazaar including the working DOI plugin? I think
the eprintsug works for a particular site, while eprints/datacite branch
fixed should work in almost every 3.3.15 eprints site.

Do you agree?


Il 30/03/2018 11:41, Yuri ha scritto:
> Hi!
>
>   ?I'm using Recollect Plugin together with DataCiteDoi, both from
> bazaar, Eprints 3.3.15
>
>   ?I'm wondering how this plugin can work. It totally misses, for
> example, contributors which are *mandatory* for Datacite MDS v4. When
> calling the MDS api, I get:
>
> The content of element 'resource' is not complete. One of
> '{"http://datacite.org/schema/kernel-4":publisher,
> "http://datacite.org/schema/kernel-4":contributors,
> "http://datacite.org/schema/kernel-4":dates,
> "http://datacite.org/schema/kernel-4":language,
> "http://datacite.org/schema/kernel-4":alternateIdentifiers,
> "http://datacite.org/schema/kernel-4":relatedIdentifiers,
> "http://datacite.org/schema/kernel-4":sizes,
> "http://datacite.org/schema/kernel-4":formats,
> "http://datacite.org/schema/kernel-4":version,
> "http://datacite.org/schema/kernel-4":fundingReferences}' is expected.
>
> I also had to comment out the "type" mapping in
> lib/cfg.d/z_datacite_mapping.pl because it got inserted on the top (*).
> Luckly, Recollect has a "data_type" which get correctly mapped on
> resourceType:
>
> |<resourceType resourceTypeGeneral="Dataset">Dataset</resourceType>|
>
>
> So, I need some help if you've been able to make it work, I'm a little
> confused.
>
>
> (*) another issue is the order of the field, given by:
>
> foreach my $field ( $dataobj->{dataset}->get_fields) in
> lib/plugins/EPrints/Plugin/Export/DataCiteXML.pm
>
> Being it a standard with known fields, to avoid errors, shouldn't be it
> a fixed and ordered list of fields?
>
>
> (**) Also, isn't the test server test.mds.datacite.org and not
> test.datacite.org/mds? If so, the z_datacitedoi.pl can be updated with
> the correct url?
>
>
> *** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
> *** Archive: http://www.eprints.org/tech.php/
> *** EPrints community wiki: http://wiki.eprints.org/
> *** EPrints developers Forum: http://forum.eprints.org/


*** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
*** Archive: http://www.eprints.org/tech.php/
*** EPrints community wiki: http://wiki.eprints.org/
*** EPrints developers Forum: http://forum.eprints.org/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ecs.soton.ac.uk/pipermail/eprints-tech/attachments/20180404/ebceca43/attachment-0001.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
Url : http://mailman.ecs.soton.ac.uk/pipermail/eprints-tech/attachments/20180404/ebceca43/attachment-0001.gif