[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[EP-tech] Fixity Check and EPrints - Digital Preservation



Probity is a bit like blockchain, but distributed. (I'm not an expert on 
blockchain)

It never caught on, which is a pity, as the idea was sound.

I've some PHP code lying around for making a basic probity website.


On 25/08/2017 11:03, John Salter wrote:
>
> Hi Tomasz,
>
> I think we're looking into similar things at the moment :o)
>
> I think there are similarities between 'fixity' and 'probity' - so 
> although there isn't integration of fixity, this might be useful info:
>
> EPrints does support 'probity' files (http://www.probity.org/), which 
> include a hash of the contents.
>
> I don?t think these are generated by default, but the $doc->rehash 
> command should generate them.
>
> See the EPrints::Probity module, and the 'rehash' option of bin/epadmin.
>
> Running [EPRINTS_ROOT]/bin/epadmin rehash [ARCHIVEID] [docid] will 
> generate a file in the owning eprint folder e.g.
>
> [EPRINTS_ROOT]/archives/[ARCHIVEID]/documents/disk0/00/00/00/01/1.2017-08-25T09=003a55=003a29Z.xsh
>
> (for eprintid? = 1, and docid = 1. Note the endcoded ':'s (=003a) in 
> the timestamp in the filename).
>
> The file has the following data:
>
> <?xml version="1.0" encoding="UTF-8" ?>
>
> <hashlist xmlns="http://probity.org/XMLprobity";>
>
> <hash>
>
> <name>wreo.txt</name>
>
> <algorithm>MD5</algorithm>
>
> <value>17f861744d77c1d9754fd7ab6f403065</value>
>
> <date>2017-08-25T09:55:45Z</date>
>
> </hash>
>
> </hashlist>
>
> You can create multiple Probity files, but I don't think there's any 
> way to compare one with another, or check the current checksum is 
> equal to the most recently store one (which is the main part of your 
> question).
>
> Cheers,
>
> John
>
> PS I'm also looking into DROID - as you were at some point. The Bazaar 
> package needs an update or three?
>
> *From:*eprints-tech-bounces at ecs.soton.ac.uk 
> [mailto:eprints-tech-bounces at ecs.soton.ac.uk] *On Behalf Of *Tomasz 
> Neugebauer
> *Sent:* 24 August 2017 18:35
> *To:* eprints-tech at ecs.soton.ac.uk
> *Subject:* [EP-tech] Fixity Check and EPrints - Digital Preservation
>
> I believe that EPrints stores a checksum value for each uploaded file, 
> but as far as I understand, there is no way to monitor if the 
> checksums match up with current file, and thus no way of checking for 
> bit rot.
>
> DSpace has the following: 
> https://wiki.duraspace.org/display/DSDOC6x/Validating+CheckSums+of+Bitstreams
>
> A periodic fixity check is a part of the lowest level of support for 
> digital preservation, i.e., ?Bit-level?.? See some examples of Digital 
> Preservation policy, all of which have some variation on this as a 
> requirement:?regularly audit checksums to ensure that no files have 
> corrupted or changed in any way. This practice ensures the ability to 
> provide an exact copy of original files over time?:
>
> ?https://www.sfu.ca/content/dam/sfu/archives/DigitalPreservation/FormatPolicyRegistry.pdf 
> ?Regularly perform fixity checks on AIPs?
>
> ?https://digital.library.yorku.ca/documentation/fixity-procedures 
> ?York University Library are committed to maintaining the integrity of 
> objects in its care. This includes creating checksums for all archival 
> format objects -- plus associated datastreams -- ingested into the 
> repository, and regular fixity checking of those objects?
>
> ?https://researchworks.lib.washington.edu/policy-preservation.html 
> "Maintains the authenticity of the bitstream through integrity checking?
>
> I understand that EPrints is primarily an open access platform, but I 
> think that we should be able to provide at least the lowest 
> ?bit-level? digital preservation support with it, and without a Fixity 
> check, I don?t think we can ensure that no files are corrupted or 
> changed over time.
>
> Preservation Metadata for Institutional Repositories 
> <http://preserv.eprints.org/papers/presmeta/pm-paper-draft.html>, a 
> report looking at EPrints and digital preservation dating back to 2007 
> states the following about Fixity checking ?Where is fixity check 
> first performed? Not within EPrints currently, but a script that 
> crawls the archive comparing files with checksums is possible?. We are 
> now 10 years later, and I am wondering if and how institutions running 
> EPrints are implementing their Fixity checks? Are you using an 
> external tool like this: https://www.avpreserve.com/tools/fixity/? Are 
> you using your own custom script?? Did you develop something that is 
> integrated with the EPrints Admin interface?
>
>
> Best wishes,
>
> Tomasz
>
> ________________________________________________
>
> Tomasz Neugebauer
> Digital Projects & Systems Development Librarian / Biblioth?caire des 
> Projets Num?riques & D?veloppement de Syst?mes
> Library / Biblioth?que
> Concordia University / Universit? Concordia//
>
> Tel. / T?l. 514-848-2424 ext. / poste 7738
> Email / courriel: tomasz.neugebauer at concordia.ca 
> <mailto:tomasz.neugebauer at concordia.ca>
>
> Mailing address / adresse postale:?1455 De Maisonneuve Blvd. 
> W.,?LB-540-03, Montreal, Quebec H3G 1M8
> Street address / adresse municipale: 1400?De Maisonneuve Blvd. 
> W.,?LB-540-03, Montreal, Quebec H3G 1M8
>
> http://library.concordia.ca <http://library.concordia.ca/>
> http://www.concordia.ca/faculty/tomasz-neugebauer.html //
>
>
>
> *** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
> *** Archive: http://www.eprints.org/tech.php/
> *** EPrints community wiki: http://wiki.eprints.org/
> *** EPrints developers Forum: http://forum.eprints.org/

-- 
Christopher Gutteridge -- http://users.ecs.soton.ac.uk/cjg

University of Southampton Open Data Service: http://data.southampton.ac.uk/
You should read our Web & Data Innovation blog: http://blogs.ecs.soton.ac.uk/webteam/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ecs.soton.ac.uk/pipermail/eprints-tech/attachments/20170825/6c770e7c/attachment.html