EPrints Technical Mailing List Archive

Message: #07253


< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First

Re: [EP-tech] non working DataciteDoi plugin (Recollect installed)


Hi Yuri,

the two plugins work quite differently.

The FIXED eprints branch supports only DataCite Metadata Schema 2.2, see https://github.com/eprints/datacite/blob/fixed/cfg/cfg.d/z_datacitedoi.pl
It has operates on a fix set of fields.

The EprintsUG plugin supports DataCite Metadata Schema 4.0 . It supports any set of fields and must be adapted to a specific repo. The fields in a repo are mapped to methods that must be specified in lib/cfg.d/z_datacite_mapping.pl (https://github.com/eprintsug/DataCiteDoi/blob/master/lib/cfg.d/z_datacite_mapping.pl )

This algorithm has its pros and cons:
- pro: very versatile and modular, the Export plugin itself must not be modified. Instead a config file can be adapted.
- con: The loop over all fields is inefficient (https://github.com/eprintsug/DataCiteDoi/blob/64956b6d4b461159ac2ae35df14105cff4a86171/lib/plugins/EPrints/Plugin/Export/DataCiteXML.pm#L62-L68 ). For DataCite Metadata Schema, there are 19 elements. A large repo may have 100 fields or more. 80% of the loop are just overhead. Imagine having to spend 80% overhead for exporting a repo of 100'000 records.
- con: dependencies between fields  are not considered; workarounds in the mapping methods must be implemented. This gets especially problematic if either of two or more fields map to the same DataCite element, but the field values have to be inserted in nested sub-elements.
- con: only eprint table fields are mapped, but there are several document table fields (format, size, license) that must be used for DataCite Metadata Schema 4.x, too, and need  additional calls within DataCiteXML.pm
- as you pointed out, there is no order of fields. While the description of the DataCite Metadata Schema (https://schema.datacite.org/meta/kernel-4.1/doc/DataCite-MetadataKernel_v4.1.pdf ) lists them in a specific order, the schema itself (https://schema.datacite.org/meta/kernel-4.1/metadata.xsd ) requires no specific order. However, for readability of the output, I would have prefered the order as outlined in the PDF.
- there are no checks against mandatory fields

I have just recently implemented the EprintsUG DataCite XML export plugin to our repository and also adapted to DataCite Metadata Schema 4.1. It should cover all DataCite elements. We need it to produce SIPs for a long-term archive project (DLCM) as well as parts of it for exporting Funding data to OpenAire (Currently, funding information is not displayed on the production system, but that will come soon).
Example output: http://www.zora.uzh.ch/cgi/export/eprint/150598/DataCiteXML/zora-eprint-150598.xml

Not sure if my code will help - some parts are highly repository-implementation specific (e.g. document types, language codes (ISO639-3) etc., funder data model which is already DataCite compatible and further).

Best regards,

Martin

--
Dr. Martin Brändle
Zentrale Informatik
Universität Zürich
Stampfenbachstr. 73
CH-8006 Zürich

mail: martin.braendle@id.uzh.ch
phone: +41 44 63 56705
fax: +41 44 63 54505
http://www.zi.uzh.ch

Inactive hide details for Yuri ---03.04.2018 11:06:42---Hi!   this version should works:Yuri ---03.04.2018 11:06:42---Hi!   this version should works:

Von: Yuri <yurj@alfa.it>
An: <eprints-tech@ecs.soton.ac.uk>
Datum: 03.04.2018 11:06
Betreff: Re: [EP-tech] non working DataciteDoi plugin (Recollect installed)
Gesendet von: eprints-tech-bounces@ecs.soton.ac.uk





Hi!

 this version should works:

https://github.com/eprints/datacite/blob/fixed/lib/plugins/EPrints/Plugin/Export/DataCiteXML.pm 
(note the FIXED name of the branch ;-) )

while this:

https://github.com/eprintsug/DataCiteDoi/blob/master/lib/plugins/EPrints/Plugin/Export/DataCiteXML.pm

lead to a malformed datacite (but aim to support all the field, maybe
they use exactly the datacite metadata??)

Please, can someone fix bazaar including the working DOI plugin? I think
the eprintsug works for a particular site, while eprints/datacite branch
fixed should work in almost every 3.3.15 eprints site.

Do you agree?


Il 30/03/2018 11:41, Yuri ha scritto:
> Hi!
>
>    I'm using Recollect Plugin together with DataCiteDoi, both from
> bazaar, Eprints 3.3.15
>
>    I'm wondering how this plugin can work. It totally misses, for
> example, contributors which are *mandatory* for Datacite MDS v4. When
> calling the MDS api, I get:
>
> The content of element 'resource' is not complete. One of
> '{"
http://datacite.org/schema/kernel-4":publisher,
> "
http://datacite.org/schema/kernel-4":contributors,
> "
http://datacite.org/schema/kernel-4":dates,
> "
http://datacite.org/schema/kernel-4":language,
> "
http://datacite.org/schema/kernel-4":alternateIdentifiers,
> "
http://datacite.org/schema/kernel-4":relatedIdentifiers,
> "
http://datacite.org/schema/kernel-4":sizes,
> "
http://datacite.org/schema/kernel-4":formats,
> "
http://datacite.org/schema/kernel-4":version,
> "
http://datacite.org/schema/kernel-4":fundingReferences}' is expected.
>
> I also had to comment out the "type" mapping in
> lib/cfg.d/z_datacite_mapping.pl because it got inserted on the top (*).
> Luckly, Recollect has a "data_type" which get correctly mapped on
> resourceType:
>
> |<resourceType resourceTypeGeneral="Dataset">Dataset</resourceType>|
>
>
> So, I need some help if you've been able to make it work, I'm a little
> confused.
>
>
> (*) another issue is the order of the field, given by:
>
> foreach my $field ( $dataobj->{dataset}->get_fields) in
> lib/plugins/EPrints/Plugin/Export/DataCiteXML.pm
>
> Being it a standard with known fields, to avoid errors, shouldn't be it
> a fixed and ordered list of fields?
>
>
> (**) Also, isn't the test server test.mds.datacite.org and not
> test.datacite.org/mds? If so, the z_datacitedoi.pl can be updated with
> the correct url?
>
>
> *** Options:
http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
> *** Archive:
http://www.eprints.org/tech.php/
> *** EPrints community wiki:
http://wiki.eprints.org/
> *** EPrints developers Forum:
http://forum.eprints.org/


*** Options:
http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
*** Archive:
http://www.eprints.org/tech.php/
*** EPrints community wiki:
http://wiki.eprints.org/
*** EPrints developers Forum:
http://forum.eprints.org/