[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[EP-tech] Antwort: Re: Digital Preservation in EPrints



I asked the National Archives about the PRONOM risk scores for formats.  The use of these is documented and commented out in the code of  the EPrints Preservation Plugin.
Although PRONOM has the potential to add risk scores, they are all blank now, and it is not something that they are looking into adding.

Tomasz


From: eprints-tech-bounces at ecs.soton.ac.uk [mailto:eprints-tech-bounces at ecs.soton.ac.uk] On Behalf Of Tomasz Neugebauer
Sent: August-17-16 5:32 PM
To: eprints-tech at ecs.soton.ac.uk
Cc: Francisco Berrizbeitia <francisco.berrizbeitia at concordia.ca>
Subject: Re: [EP-tech] Antwort: Re: Digital Preservation in EPrints

I have been going through the installation of the DROID and Preservation Toolkit plugins over the last few days.
It was difficult to figure out, so I thought I would share a summary of what I learned about these plugins, and how I got them to work:

DROID
Bazaar: http://bazaar.eprints.org/143/
GitHub: https://github.com/eprintsug/droid
Prerequisites: Java 1.6 or higher

What it does / how I got it to work:

On activation, it is supposed to download the DROID 4 tar file from here:
http://freefr.dl.sourceforge.net/project/droid/droid/4.0.0/droid-4.0.0-linux.tar.gz
Then untar it into /lib/bin/DROID
All of this failed without any error message on my EPrints 3.3.12
The bazaar package said it installed OK, but it didn't report the fact that it was unable to complete the required steps.
There is a message on the list about File::Move vs File::Copy::Recursive::rmove, but I couldn?t get this work (http://www.eprints.org/tech.php/thread-16264.html )  Instead, I manually download tar file, and untar it (using command line) manually into /lib/bin/DROID/ folder.
The plugin also adds some cron events for updating the DROID_SignatureFile.xml and running the scan - I think this part is working.  I was also able to update the signature file using the command line:
java -jar /lib/bin/DROID/droid.jar -d /lib/bin/DROID/DROID_SignatureFile.xml

=======================

PRESERVATION Toolkit
Bazaar: http://bazaar.eprints.org/142/
Github: https://github.com/eprintsug/preservation_toolkit
Prerequisites: DROID
Some documentation: http://www.eprints.org/software/training/3.2/admin/filerisks_tutorial.php

What it does / how I got it to work:

It is supposed to provide Editors with a Format/Risks button that would list the count of documents and their corresponding format types in their repository.
After plugin install, the button didn?t show up on my repository, because the can_be_viewed permission on line 45 of FormatRisks.pm didn?t exist in my EPrints, so I changed line 45 of FormatsRisks.pm to return $self->allow( "config/view" );  That got me a button.  Clicking on it, at first, it said that I had no objects in the repository, along with a new button: ?Request File Type Recount? appears.  Either by pushing this button, or on plugin activation (I?m not sure), a cron event is added which went through the repository and a results table with two categories: 1) High Risk Objects ? these are all the UNKNOWN (DROID found no classification match) 2) Format Breakdown ? list of format types and how many there is of each.  It would be great if it provided a button to know/list which documents are high risk ? there is mention of this in the docs (a ?plus? button), but I didn?t see this working.  Has anyone figured out how to get the ?plus? button or something like it, so that I can quickly find out which documents belong to the ?high risk? category?
In the documentation and the code, there is mention of classification into ?low?, ?medium? and ?high? risk, but this is not working.   There are a number of reasons for that.  First, the ?update_risk_scores()? call on line 23 of Update_Pronom_File_Counts is commented out (as is the whole function).  This is the function that uses SOAP::Lite to query PRONOM at NationalArchives for risk scores associated with each format.  Since this is actually commented out in the plugin, I see no reason to install SOAP::Lite.    Second, and this part I found most confusing: it looks to me like PRONOM still doesn?t have any risk scores associated with any format types in its database (is that correct?)? so it may be pointless to try to activate this part of the plugin.  PRONOM allows you to query for risk scores (see: http://www.nationalarchives.gov.uk/PRONOM/Format/proFormatSearch.aspx?status=new ) but if you search, you will see that all formats have a blank risk score.   The documentation for the plugin talks about an ?unstable? risk score retrieval set up for testing at EPrints, and used to generate screenshots for the docs/presentations.

My apologies for the long message; if you have read all of this, and want to correct something  or add some information, it would be very much appreciated.

Best wishes,

Tomasz



________________________________________________
Tomasz Neugebauer
Digital Projects & Systems Development Librarian / Biblioth?caire des Projets Num?riques & D?veloppement de Syst?mes
Library / Biblioth?que
Concordia University / Universit? Concordia
Tel. / T?l. 514-848-2424 ext. / poste 7738
Email / courriel: tomasz.neugebauer at concordia.ca<mailto:tomasz.neugebauer at concordia.ca>
Mailing address / adresse postale: 1455 De Maisonneuve Blvd. W., LB-540-03, Montreal, Quebec H3G 1M8
Street address / adresse municipale: 1400 De Maisonneuve Blvd. W., LB-540-03, Montreal, Quebec H3G 1M8
library.concordia.ca<applewebdata://C9C1E84C-3623-4E5E-B6D9-195F37B39665/library.concordia.ca>
concordia.ca<file:///\\fas02sgw.concordia.ca\homeJ$\jroac\Desktop\concordia.ca>
Twitter: https://twitter.com/photomediathink

[Description: Concordia-NewLogo-EMAIL]


From: eprints-tech-bounces at ecs.soton.ac.uk<mailto:eprints-tech-bounces at ecs.soton.ac.uk> [mailto:eprints-tech-bounces at ecs.soton.ac.uk] On Behalf Of martin.braendle at id.uzh.ch<mailto:martin.braendle at id.uzh.ch>
Sent: May-31-16 7:49 AM
To: eprints-tech at ecs.soton.ac.uk<mailto:eprints-tech at ecs.soton.ac.uk>
Subject: [EP-tech] Antwort: Re: Digital Preservation in EPrints


Hi Tomasz,

the command line version of DROID 6.x does not support the FileCollection XML report as it was created by DROID 4 and used by the Preservation Toolkit Bazaar package, see also discussion on https://groups.google.com/forum/#!topic/droid-list/odOGT7ccn2I

I have somewhere on my disk the tar file for DROID 4. Please contact me off-list if you want to have it. It runs with Java 1.6 or higher - we run it with Java 1.8.

DROID 4 is indeed outdated. We noted that even with the most recent PRONOM signature files, it does not recognize the format of about 4% of our PDF files, while spot checks revealed that DROID 6 does recognize the format.

It is still on my todo list (in the course of the SUK P-2 project Digital Life Cycle Management) to make the preservation toolkit compatible with DROID 6.

Best regards,

Martin

--
Dr. Martin Br?ndle
Zentrale Informatik
Universit?t Z?rich
Stampfenbachstr. 73
CH-8006 Z?rich

mail: martin.braendle at id.uzh.ch<mailto:martin.braendle at id.uzh.ch>
phone: +41 44 63 56705
fax: +41 44 63 54505
http://www.zi.uzh.ch

[Inactive hide details for Adam Field ---31/05/2016 12:32:25---The preservation toolkit is really quite old now.  It it?s impor]Adam Field ---31/05/2016 12:32:25---The preservation toolkit is really quite old now.  It it?s important, perhaps there?s some community

Von: Adam Field <Adam.Field at jisc.ac.uk<mailto:Adam.Field at jisc.ac.uk>>
An: "eprints-tech at ecs.soton.ac.uk<mailto:eprints-tech at ecs.soton.ac.uk>" <eprints-tech at ecs.soton.ac.uk<mailto:eprints-tech at ecs.soton.ac.uk>>
Datum: 31/05/2016 12:32
Betreff: Re: [EP-tech] Digital Preservation in EPrints
Gesendet von: eprints-tech-bounces at ecs.soton.ac.uk<mailto:eprints-tech-bounces at ecs.soton.ac.uk>

________________________________



The preservation toolkit is really quite old now.  It it?s important, perhaps there?s some community effort that can be directed at it.  I?m happy to assist as much as my current job allows.


[cid:image002.png at 01D1F71E.D1F61520]<http://www.jisc.ac.uk/>

Adam Field
SHERPA services analyst developer




From: <eprints-tech-bounces at ecs.soton.ac.uk<mailto:eprints-tech-bounces at ecs.soton.ac.uk>> on behalf of Tomasz Neugebauer <Tomasz.Neugebauer at concordia.ca<mailto:Tomasz.Neugebauer at concordia.ca>>
Reply-To: "eprints-tech at ecs.soton.ac.uk<mailto:eprints-tech at ecs.soton.ac.uk>" <eprints-tech at ecs.soton.ac.uk<mailto:eprints-tech at ecs.soton.ac.uk>>
Date: Thursday, 26 May 2016 21:16
To: "eprints-tech at ecs.soton.ac.uk<mailto:eprints-tech at ecs.soton.ac.uk>" <eprints-tech at ecs.soton.ac.uk<mailto:eprints-tech at ecs.soton.ac.uk>>
Subject: [EP-tech] Digital Preservation in EPrints

To use the Digital Preservation Toolkit (http://bazaar.eprints.org/142/), DROID (http://bazaar.eprints.org/143/) is required. DROID runs on Java.
The DROID bazaar plugin mentions ?DROID v.4?
Meanwhile current version of DROID is on Version 6.2.1 (http://www.nationalarchives.gov.uk/information-management/manage-information/preserving-digital-records/droid/)
Given that we are running EPrints 3.3.12, what is the recommended setup for this?
Should we try to find DROID v4 or can we run the latest version of DROID?
If we need DROID 4, where do we get that?  Also, what version of Java does that require?
Current version of DROID  requires a minimum of Java 6 Standard Edition (SE), built and tested on Java 1.6 update 30.

Tomasz



Jisc is a registered charity (number 1149740) and a company limited by guarantee which is registered in England under Company No. 5747339, VAT No. GB 197 0632 86. Jisc?s registered office is: One Castlepark, Tower Hill, Bristol, BS2 0JA. T 0203 697 5800.

Jisc Services Limited is a wholly owned Jisc subsidiary and a company limited by guarantee which is registered in England under company number 2881024, VAT number GB 197 0632 86. The registered office is: One Castle Park, Tower Hill, Bristol BS2 0JA. T 0203 697 5800. *** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
*** Archive: http://www.eprints.org/tech.php/
*** EPrints community wiki: http://wiki.eprints.org/
*** EPrints developers Forum: http://forum.eprints.org/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ecs.soton.ac.uk/pipermail/eprints-tech/attachments/20160819/d52f60a2/attachment-0001.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image003.png
Type: image/png
Size: 5338 bytes
Desc: image003.png
Url : http://mailman.ecs.soton.ac.uk/pipermail/eprints-tech/attachments/20160819/d52f60a2/attachment-0002.png 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image004.gif
Type: image/gif
Size: 105 bytes
Desc: image004.gif
Url : http://mailman.ecs.soton.ac.uk/pipermail/eprints-tech/attachments/20160819/d52f60a2/attachment-0001.gif 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image005.png
Type: image/png
Size: 1264 bytes
Desc: image005.png
Url : http://mailman.ecs.soton.ac.uk/pipermail/eprints-tech/attachments/20160819/d52f60a2/attachment-0003.png