EPrints Technical Mailing List Archive

Message: #06796


< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First

Re: [EP-tech] Fixity Check and EPrints - Digital Preservation


Hi Tomasz,

I think we're looking into similar things at the moment :o)

 

I think there are similarities between 'fixity' and 'probity' - so although there isn't integration of fixity, this might be useful info:

EPrints does support 'probity' files (http://www.probity.org/), which include a hash of the contents.

 

I don’t think these are generated by default, but the $doc->rehash command should generate them.

See the EPrints::Probity module, and the 'rehash' option of bin/epadmin.

 

Running [EPRINTS_ROOT]/bin/epadmin rehash [ARCHIVEID] [docid] will generate a file in the owning eprint folder e.g.

[EPRINTS_ROOT]/archives/[ARCHIVEID]/documents/disk0/00/00/00/01/1.2017-08-25T09=003a55=003a29Z.xsh

(for eprintid  = 1, and docid = 1. Note the endcoded ':'s (=003a) in the timestamp in the filename).

 

The file has the following data:

<?xml version="1.0" encoding="UTF-8" ?>

<hashlist xmlns="http://probity.org/XMLprobity">

  <hash>

    <name>wreo.txt</name>

    <algorithm>MD5</algorithm>

    <value>17f861744d77c1d9754fd7ab6f403065</value>

    <date>2017-08-25T09:55:45Z</date>

  </hash>

</hashlist>

 

You can create multiple Probity files, but I don't think there's any way to compare one with another, or check the current checksum is equal to the most recently store one (which is the main part of your question).

 

Cheers,

John

 

PS I'm also looking into DROID - as you were at some point. The Bazaar package needs an update or three…

 

 

From: eprints-tech-bounces@ecs.soton.ac.uk [mailto:eprints-tech-bounces@ecs.soton.ac.uk] On Behalf Of Tomasz Neugebauer
Sent: 24 August 2017 18:35
To: eprints-tech@ecs.soton.ac.uk
Subject: [EP-tech] Fixity Check and EPrints - Digital Preservation

 

I believe that EPrints stores a checksum value for each uploaded file, but as far as I understand, there is no way to monitor if the checksums match up with current file, and thus no way of checking for bit rot. 

DSpace has the following: https://wiki.duraspace.org/display/DSDOC6x/Validating+CheckSums+of+Bitstreams

 

A periodic fixity check is a part of the lowest level of support for digital preservation, i.e., “Bit-level”.  See some examples of Digital Preservation policy, all of which have some variation on this as a requirement:“regularly audit checksums to ensure that no files have corrupted or changed in any way. This practice ensures the ability to provide an exact copy of original files over time”:

·         https://www.sfu.ca/content/dam/sfu/archives/DigitalPreservation/FormatPolicyRegistry.pdf “Regularly perform fixity checks on AIPs”

·         https://digital.library.yorku.ca/documentation/fixity-procedures “York University Library are committed to maintaining the integrity of objects in its care. This includes creating checksums for all archival format objects -- plus associated datastreams -- ingested into the repository, and regular fixity checking of those objects”

·         https://researchworks.lib.washington.edu/policy-preservation.html "Maintains the authenticity of the bitstream through integrity checking”

 

I understand that EPrints is primarily an open access platform, but I think that we should be able to provide at least the lowest “bit-level” digital preservation support with it, and without a Fixity check, I don’t think we can ensure that no files are corrupted or changed over time.

 

Preservation Metadata for Institutional Repositories, a report looking at EPrints and digital preservation dating back to 2007 states the following about Fixity checking “Where is fixity check first performed? Not within EPrints currently, but a script that crawls the archive comparing files with checksums is possible”.  We are now 10 years later, and I am wondering if and how institutions running EPrints are implementing their Fixity checks? Are you using an external tool like this: https://www.avpreserve.com/tools/fixity/? Are you using your own custom script?  Did you develop something that is integrated with the EPrints Admin interface?

 


Best wishes,

Tomasz

 

 

 

 

________________________________________________

Tomasz Neugebauer
Digital Projects & Systems Development Librarian / Bibliothécaire des Projets Numériques & Développement de Systèmes
Library / Bibliothèque
Concordia University / Université Concordia

Tel. / Tél. 514-848-2424 ext. / poste 7738
Email / courriel:
tomasz.neugebauer@concordia.ca

Mailing address / adresse postale: 1455 De Maisonneuve Blvd. W., LB-540-03, Montreal, Quebec H3G 1M8
Street address / adresse municipale: 1400 De Maisonneuve Blvd. W., LB-540-03, Montreal, Quebec H3G 1M8

http://library.concordia.ca
http://www.concordia.ca/faculty/tomasz-neugebauer.html