EPrints Technical Mailing List Archive

Message: #04571


< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First

[EP-tech] Re: duplicate detection in EPrints 3.3


Hi Rory,

 

Thank you!  Issues looks like what I should be using for this.

 

Based on the documentation of how Issues work, ticking “Live Archive” on the Item Status and leaving all other search options unchecked should display all issues of any type in the repository, right?

I always get “Search has no matches” on the Search for Issues interface, which is reassuring to some extent, but is there another way to double check that the issue tracking is working, and there are indeed no issues (such as title duplicates)?

 

Tomasz

 

 

 

From: eprints-tech-bounces@ecs.soton.ac.uk [mailto:eprints-tech-bounces@ecs.soton.ac.uk] On Behalf Of Rory McNicholl
Sent: August-20-15 1:37 PM
To: eprints-tech@ecs.soton.ac.uk
Subject: [EP-tech] Re: duplicate detection in EPrints 3.3

 

Hi Tomasz,

 

Any reason not to use Issues for this?

 

http://wiki.eprints.org/w/Issues

 

You can design your own  issues to include authors too.

 

Cheers,

 

Rory

 

Rory McNicholl

Lead developer

Digital Archives & Research Technologies

University of London Computer Centre

Senate House

Malet Street

London

WC1E 7HU

 

t: +44 (0)20 7863 1344

 

The University of London is an exempt charity in England and Wales.

 


From: eprints-tech-bounces@ecs.soton.ac.uk <eprints-tech-bounces@ecs.soton.ac.uk> on behalf of Tomasz Neugebauer <Tomasz.Neugebauer@concordia.ca>
Sent: 20 August 2015 17:58
To: eprints-tech@ecs.soton.ac.uk
Subject: [EP-tech] duplicate detection in EPrints 3.3

 

I would like to run a script that will go through my repository (3.3.12) and report any likely duplicates based on title (and possibly author).

What is the best way of doing this?

 

I found the following two plugins in EPrints Files:

·         Sebastien Francois’ EPrints 2 script: http://files.eprints.org/107/

·         Jon Hallet’s EPrints 3>3.1  script: http://files.eprints.org/640/

 

In addition,

·         There is a title_duplicates script in /cgi/users/lookup/ http://wiki.eprints.org/w/Cgi/users/lookup/

·         Page 40 of this file (http://www.eprints.org/software/training/programming/api_techniques.pdf)  refers to a duplicate detection script in the bin folder as an example – I couldn’t find this script – probably just an example of what could be done.

 

 

Is the Jon Hallett’s script in EPrints Files the most up-to-date version available?

Has anyone created a Bazaar version for duplicate detection and/or is there is something more recent that I am missing?

 

Tomasz

 

 

 

________________________________________________

Tomasz Neugebauer
Digital Projects & Systems Development Librarian
Libraries / Bibliothèques
Concordia University / Université Concordia