EPrints Technical Mailing List Archive

Message: #07256


< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First

Re: [EP-tech] non working DataciteDoi plugin (Recollect installed)


The "fixed" branch works OOTB, while the bazaar plugin not. My point was to make it works OOTB, or put some disclaimer or guide on how to make it work on a site.

Maybe the problem was not the order but that z_datacite_mapping.pl has this:

$c->{datacite_mapping_data_type} = sub {

    my($xml, $dataobj, $repo, $value) = @_;

    return $xml->create_data_element("resourceType", $value, resourceTypeGeneral=>$value);
};

and also this:

$c->{datacite_mapping_type} = sub {

    my($xml, $dataobj, $repo, $value) = @_;

    my $pub_resourceType = $repo->get_conf("datacitedoi", "typemap", $value);
    if (defined $pub_resourceType) {
        return $xml->create_data_element("resourceType", $pub_resourceType->{'v'}, resourceTypeGeneral=>$pub_resourceType->{'a'});
    }

    return undef;
};

so both "type" and "data_type" fields (which always exists in Eprints) lead to a double resourceType xml field thus it does not validate. Also contributors is mandatory, but z_mapping misses a $c->{datacite_mapping_contributors} entry.

I think it should not be so difficult to fix it (remove one of type mapping and implement the mandatory fields) to make it work OOTB, so people can just install and use the plugin using bazaar.

Thanks for your explanations, they're very useful to me for implementing it for my site. Below other ideas:

Il 04/04/2018 13:25, martin.braendle@id.uzh.ch ha scritto:

Hi Yuri,

the two plugins work quite differently.

The FIXED eprints branch supports only DataCite Metadata Schema 2.2, see https://github.com/eprints/datacite/blob/fixed/cfg/cfg.d/z_datacitedoi.pl
It has operates on a fix set of fields.

The EprintsUG plugin supports DataCite Metadata Schema 4.0 . It supports any set of fields and must be adapted to a specific repo. The fields in a repo are mapped to methods that must be specified in lib/cfg.d/z_datacite_mapping.pl (https://github.com/eprintsug/DataCiteDoi/blob/master/lib/cfg.d/z_datacite_mapping.pl ;)

This algorithm has its pros and cons:
- pro: very versatile and modular, the Export plugin itself must not be modified. Instead a config file can be adapted. - con: The loop over all fields is inefficient (https://github.com/eprintsug/DataCiteDoi/blob/64956b6d4b461159ac2ae35df14105cff4a86171/lib/plugins/EPrints/Plugin/Export/DataCiteXML.pm#L62-L68 ;). For DataCite Metadata Schema, there are 19 elements. A large repo may have 100 fields or more. 80% of the loop are just overhead. Imagine having to spend 80% overhead for exporting a repo of 100'000 records.


I agree, but instead of loop over all the fields, just put the datacite metadata fields one in a config too. I found wrong to start from the eprints fields, better to have a fixed list of datacite fields and loop over them, reading values from eprints fields. Also, in Eprints, exports can be created as text on saving and cached in the filesystem, I think.

- con: dependencies between fields  are not considered; workarounds in the mapping methods must be implemented. This gets especially problematic if either of two or more fields map to the same DataCite element, but the field values have to be inserted in nested sub-elements.


The same apply here, start from the datacite list of metadata fields.

- as you pointed out, there is no order of fields. While the description of the DataCite Metadata Schema (https://schema.datacite.org/meta/kernel-4.1/doc/DataCite-MetadataKernel_v4.1.pdf ;) lists them in a specific order, the schema itself (https://schema.datacite.org/meta/kernel-4.1/metadata.xsd ;) requires no specific order. However, for readability of the output, I would have prefered the order as outlined in the PDF.


The same here, starting from the datacite list of metadata fields make it possible to decide the exact order.

- there are no checks against mandatory fields


Also there's no check on success of coining DOIs. I think I'll send an email report to the user and the repository manager when a DOI is created or if there's an error (Datacite not answering, wrong metadata, etc etc).


I have just recently implemented the EprintsUG DataCite XML export plugin to our repository and also adapted to DataCite Metadata Schema 4.1. It should cover all DataCite elements. We need it to produce SIPs for a long-term archive project (DLCM) as well as parts of it for exporting Funding data to OpenAire (Currently, funding information is not displayed on the production system, but that will come soon). Example output: http://www.zora.uzh.ch/cgi/export/eprint/150598/DataCiteXML/zora-eprint-150598.xml

Not sure if my code will help - some parts are highly repository-implementation specific (e.g. document types, language codes (ISO639-3) etc., funder data model which is already DataCite compatible and further).

Best regards,

Martin

--
Dr. Martin Brändle
Zentrale Informatik
Universität Zürich
Stampfenbachstr. 73
CH-8006 Zürich

mail: martin.braendle@id.uzh.ch
phone: +41 44 63 56705
fax: +41 44 63 54505
http://www.zi.uzh.ch

Inactive hide details for Yuri ---03.04.2018 11:06:42---Hi!  this version should works:Yuri ---03.04.2018 11:06:42---Hi!   this version should works:

Von: Yuri <yurj@alfa.it>
An: <eprints-tech@ecs.soton.ac.uk>
Datum: 03.04.2018 11:06
Betreff: Re: [EP-tech] non working DataciteDoi plugin (Recollect installed)
Gesendet von: eprints-tech-bounces@ecs.soton.ac.uk

------------------------------------------------------------------------



Hi!

 this version should works:

https://github.com/eprints/datacite/blob/fixed/lib/plugins/EPrints/Plugin/Export/DataCiteXML.pm
(note the FIXED name of the branch ;-) )

while this:

https://github.com/eprintsug/DataCiteDoi/blob/master/lib/plugins/EPrints/Plugin/Export/DataCiteXML.pm

lead to a malformed datacite (but aim to support all the field, maybe
they use exactly the datacite metadata??)

Please, can someone fix bazaar including the working DOI plugin? I think
the eprintsug works for a particular site, while eprints/datacite branch
fixed should work in almost every 3.3.15 eprints site.

Do you agree?


Il 30/03/2018 11:41, Yuri ha scritto:
> Hi!
>
>    I'm using Recollect Plugin together with DataCiteDoi, both from
> bazaar, Eprints 3.3.15
>
>    I'm wondering how this plugin can work. It totally misses, for
> example, contributors which are *mandatory* for Datacite MDS v4. When
> calling the MDS api, I get:
>
> The content of element 'resource' is not complete. One of
> '{"http://datacite.org/schema/kernel-4":publisher,
> "http://datacite.org/schema/kernel-4":contributors,
> "http://datacite.org/schema/kernel-4":dates,
> "http://datacite.org/schema/kernel-4":language,
> "http://datacite.org/schema/kernel-4":alternateIdentifiers,
> "http://datacite.org/schema/kernel-4":relatedIdentifiers,
> "http://datacite.org/schema/kernel-4":sizes,
> "http://datacite.org/schema/kernel-4":formats,
> "http://datacite.org/schema/kernel-4":version,
> "http://datacite.org/schema/kernel-4":fundingReferences}' is expected.
>
> I also had to comment out the "type" mapping in
> lib/cfg.d/z_datacite_mapping.pl because it got inserted on the top (*).
> Luckly, Recollect has a "data_type" which get correctly mapped on
> resourceType:
>
> |<resourceType resourceTypeGeneral="Dataset">Dataset</resourceType>|
>
>
> So, I need some help if you've been able to make it work, I'm a little
> confused.
>
>
> (*) another issue is the order of the field, given by:
>
> foreach my $field ( $dataobj->{dataset}->get_fields) in
> lib/plugins/EPrints/Plugin/Export/DataCiteXML.pm
>
> Being it a standard with known fields, to avoid errors, shouldn't be it
> a fixed and ordered list of fields?
>
>
> (**) Also, isn't the test server test.mds.datacite.org and not
> test.datacite.org/mds? If so, the z_datacitedoi.pl can be updated with
> the correct url?
>
>
> *** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
> *** Archive: http://www.eprints.org/tech.php/
> *** EPrints community wiki: http://wiki.eprints.org/
> *** EPrints developers Forum: http://forum.eprints.org/


*** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
*** Archive: http://www.eprints.org/tech.php/
*** EPrints community wiki: http://wiki.eprints.org/
*** EPrints developers Forum: http://forum.eprints.org/




*** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
*** Archive: http://www.eprints.org/tech.php/
*** EPrints community wiki: http://wiki.eprints.org/
*** EPrints developers Forum: http://forum.eprints.org/