EPrints Technical Mailing List Archive

Message: #07655


< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First

Re: [EP-tech] Thesis Bulk Upload/Import


Hi James,

In answer to your questions:

1) I think there are various reasons that institutions have separate
theses repositories:

- They want to showcase their theses with different branding/theme to
their normal repository.  Although different branding can be applied to
different archives within one repository, it requires more advanced
knowledge of configuring EPrints.  It may also still lead to issue if
you change a core element and do not realise that will be inherited by
the thesis branding/theme.

- Some institutions will restrict access to their main repository so
only staff can submit.  As theses may be submitted by the students who
have written then having a separate repository can facilitate this.  It
can also more easily facilitates a different process of review, as I
have observed in several repositories.

- Sometimes I think this just comes down to a political decision that
the institution wants to be keep theses separate.  I can imagine that
in the UK although it should not be a problem, in some people's eye it
will simplify the REF (Research Excellence Framework) process, as
theses would not generally be REF returnable.

2+3) The best way to import as much metadata with the highest accuracy
 possible is to use EPrints XML import.  This allows you to submit
multiple publications at once.  It will also import documents submitted
by URL in the metadata as long as those URLs are freely accessible.
 The problem with using EPrints XML import (assuming you are exporting
from one EPrints repository to import), is that you may lose metadata
or even have trouble importing if the field are either not in the
importing repository or are of a different type (e.g. a free text field
vs a multiple value field).  EPrints XML import provides a facility to
test whether an import would be successful without actually importing.
 Also, the XML schema for a repository can be found at /cgi/schema if
you want to craft your own EPrints XML from the metadata source you
already have.

Other formats to EPrints XML may be more suited to your purpose if you
are importing from a non-EPrints source.  I do not have a huge amount
of experience working with these other formats. Maybe others could
advise.  You can use EPrints CRUD API to import in any format if you
want to automate the process:

https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwiki.eprints.org%2Fw%2FAPI%3AEPrints%2FApache%2FCRUD&amp;data=01%7C01%7C%7C3b49ebb635874fcfd9b508d67c6a56a8%7C4a5378f929f44d3ebe89669d03ada9d8%7C1&amp;sdata=KYAlw2KV%2F9JQ5KPVP%2BnpkHx0kHuN1abrmdoG%2B5UFU6g%3D&amp;reserved=0

This also allows you to push documents rather than have them pulled
from an accessible URL.


Regards

David Newman


On Thu, 2019-01-17 at 10:09 +0000, James Kerwin via Eprints-tech wrote:
> Hi All,
>
> The University I work at is currently exploring options for
> digitising our collection of theses, with an aim of them going into
> the institutional repository and I have some questions if anybody
> could lend me some of their experience and opinions.
>
> 1) I've noticed some organisations have a separate instance of
> EPrints for theses. We currently put each thesis into the
> institutional repository along with all other types of item. Is there
> a benefit to separating them out?
>
> 2) Does EPrints facilitate any sort of bulk upload of Documents and
> EPrint record creation? I've had a quick look around and found the
> following from Tomasz Neugebauer and Bin Han:
>
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.researchgate.net%2Fpublication%2F291251891_Batch_Ingesting_in&amp;data=01%7C01%7C%7C3b49ebb635874fcfd9b508d67c6a56a8%7C4a5378f929f44d3ebe89669d03ada9d8%7C1&amp;sdata=rRR5KC%2BR%2FB0wP6GuvC2T2RZmUeSejXCymDZJjQzuYHk%3D&amp;reserved=0
> to_EPrints_Digital_Repository_Software
>
> I'm curious to see if this is still relevant (it's very thorough) or
> if there are any other methods or potential pitfalls to avoid.
>
> 3) Following on from Q2, is there a preferred/ideal format of
> metadata? The article makes it clear that many different formats are
> supported, but again I'm wondering if there are any pros and cons to
> any particular format.
>
> The digitising won't be complete for some time so I'm taking the
> opportunity to get ahead of it and be ready.
>
> Thanks,
> James
>
> *** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-
> tech
> *** Archive: https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.eprints.org%2Ftech.php%2F&amp;data=01%7C01%7C%7C3b49ebb635874fcfd9b508d67c6a56a8%7C4a5378f929f44d3ebe89669d03ada9d8%7C1&amp;sdata=VCDZ1sR66ByRZ%2BWIOqPsQxrge9wPg2AEG4tFEL%2FU1d8%3D&amp;reserved=0
> *** EPrints community wiki: https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwiki.eprints.org%2F&amp;data=01%7C01%7C%7C3b49ebb635874fcfd9b508d67c6a56a8%7C4a5378f929f44d3ebe89669d03ada9d8%7C1&amp;sdata=Hy5jflNtkOJlDpN6j9xx%2Bm5pHqz3bDp%2BMUtYASgEcVY%3D&amp;reserved=0
> *** EPrints developers Forum: https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fforum.eprints.org%2F&amp;data=01%7C01%7C%7C3b49ebb635874fcfd9b508d67c6a56a8%7C4a5378f929f44d3ebe89669d03ada9d8%7C1&amp;sdata=SUsV28TkfIVF3BgPI0GAXbhuScAF2RyFHtZBc68KMT4%3D&amp;reserved=0