[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[EP-tech] Re: Check uploaded document for PDF/A compatibility



Good point, John.  I was originally going to include a warning about how long it takes and slowing down the deposit process, but I got so excited by writing pseudocode that I forgot.

--
Adam Field
Business Relationship Manager and Community Lead
EPrints Services
+44 (0)23 8059 8814





On 13 Oct 2015, at 09:58, John Salter wrote:

> Hi,
>> From a user-experience point of view, you might want to have two routes (depending on how fast VeraPDF can process things).
> 1. For small PDFs, process them real-time (with some AJAX feedback - no one like a browser that looks like it's doing nothing!)
> 2. For large PDFs, queue an index job that will process the item.
> 
> If the first thing a user does is upload the file, the index job might have processed the item by the time they have added the rest of the metadata.
> If it hasn't you could either prevent them from depositing the item until all documents have been checked, or you could use the indexer event to move the item back into their workspace with a message regarding the PDF.
> 
> Cheers,
> John
> 
> -----Original Message-----
> From: eprints-tech-bounces at ecs.soton.ac.uk [mailto:eprints-tech-bounces at ecs.soton.ac.uk] On Behalf Of Field A.N.
> Sent: 13 October 2015 09:39
> To: eprints-tech at ecs.soton.ac.uk
> Subject: [EP-tech] Re: Check uploaded document for PDF/A compatibility
> 
> The cfg.d/document_validate.pl configuration file is probably what you want
> 
> https://github.com/eprints/eprints/blob/3.3/lib/defaultcfg/cfg.d/document_validate.pl
> 
> 
> Don't forget you can use backticks to call command-line utitilites if there isn't a perl library to do the work.  Something like:
> 
> 
> my $pdf_file_path = $document->get_main->path;
> my $cmd = '/usr/local/bin/thingy';
> my $output = `$cmd --file=$pdf_file_path --verbose`; #whatever you need to do to call the command
> if ($output && $output =~ m/this is a bad file/) #match output indicating failure
> {
> 	push @problems, $repository->html_phrase('validate:pdf_not_ideal');
> }
> 
> 
> --
> Adam Field
> Business Relationship Manager and Community Lead
> EPrints Services
> +44 (0)23 8059 8814
> 
> 
> 
> 
> 
> On 13 Oct 2015, at 09:20, Roth-Steiner, Roland wrote:
> 
>> Hi,
>> 
>> I would like to have it checked directly after the upload - so we can inform the user, that we need valid PDF/A and point him to an FAQ with the right howto.
>> 
>> Since there will be a huge number of uploads, this needs to be fully automated and, as mentioned, instantly after the upload stage.
>> 
>> So where should I best place the call for the PDF/A validation script to have it run directly after document upload?
>> 
>> Thanks again
>> 
>> .......................................
>> Roland Roth-Steiner
>> M.Sc. Wirtsch.-Inf., Dipl.-Bibl.
>> . Univ.- und Landesbibliothek
>> ... Elektronische Informationsdienste
>> ... Leitung Digitalisierungszentrum
>> ... Fachreferat Wirtschaft
>> . Magdalenenstr. 8, 64289 Darmstadt
>> +49 (0)6151 16-76280
>> .......................................
>> 
>> ________________________________________
>> Von: eprints-tech-bounces at ecs.soton.ac.uk [eprints-tech-bounces at ecs.soton.ac.uk]" im Auftrag von "Field A.N. [af05v at ecs.soton.ac.uk]
>> Gesendet: Montag, 12. Oktober 2015 17:37
>> An: eprints-tech at ecs.soton.ac.uk
>> Betreff: [EP-tech] Re: Check uploaded document for PDF/A compatibility
>> 
>> You could also define a new issue and have it run by the issues infrastructure.
>> 
>> --
>> Adam Field
>> Business Relationship Manager and Community Lead
>> EPrints Services
>> +44 (0)23 8059 8814
>> 
>> 
>> 
>> 
>> 
>> On 12 Oct 2015, at 15:53, Roth-Steiner, Roland wrote:
>> 
>>> Hello,
>>> 
>>> VeraPDF really looks promising + highly configurable....
>>> 
>>> Where would I best clip in the PDF-checker-script?
>>> 
>>> In documents_fields_automatic.pl, document_upload.pl, document_validate.pl ?
>>> 
>>> eprint_validate or eprint_warnings.pl ?
>>> 
>>> Or maybe in the deposit-stage... ?
>>> 
>>> 
>>> Thanks
>>> 
>>> .......................................
>>> Roland Roth-Steiner
>>> M.Sc. Wirtsch.-Inf., Dipl.-Bibl.
>>> . Univ.- und Landesbibliothek
>>> ... Elektronische Informationsdienste
>>> ... Leitung Digitalisierungszentrum
>>> ... Fachreferat Wirtschaft
>>> . Magdalenenstr. 8, 64289 Darmstadt
>>> +49 (0)6151 16-76280
>>> .......................................
>>> 
>>> ________________________________________
>>> Von: eprints-tech-bounces at ecs.soton.ac.uk [eprints-tech-bounces at ecs.soton.ac.uk]" im Auftrag von "John Salter [J.Salter at leeds.ac.uk]
>>> Gesendet: Donnerstag, 8. Oktober 2015 14:14
>>> An: 'eprints-tech at ecs.soton.ac.uk'
>>> Betreff: [EP-tech] Re: Check uploaded document for PDF/A compatibility
>>> 
>>> Hi,
>>> I haven't, but I am keeping track of the development of VeraPDF: http://verapdf.org/home/ - which you may be interested in.
>>> 
>>> Their roadmap is here: http://verapdf.org/roadmap/ - which looks like December 2016 for the first Release - but if anyone wants to get to grips with one of the beta versions - do (and let us know how it goes!).
>>> 
>>> Cheers,
>>> John
>>> 
>>> 
>>> -----Original Message-----
>>> From: eprints-tech-bounces at ecs.soton.ac.uk [mailto:eprints-tech-bounces at ecs.soton.ac.uk] On Behalf Of Roth-Steiner, Roland
>>> Sent: 08 October 2015 12:23
>>> To: Eprints Tech Mailing List
>>> Subject: [EP-tech] Check uploaded document for PDF/A compatibility
>>> 
>>> Hi list,
>>> 
>>> i would like to check a document directly after upload (for pdf/A compatibility).
>>> 
>>> Has anyone already done this?
>>> 
>>> Thanks
>>> 
>>> Roland
>>> 
>>> .......................................
>>> Roland Roth-Steiner
>>> M.Sc. Wirtsch.-Inf., Dipl.-Bibl.
>>> . Univ.- und Landesbibliothek
>>> ... Elektronische Informationsdienste
>>> ... Leitung Digitalisierungszentrum
>>> ... Fachreferat Wirtschaft
>>> . Magdalenenstr. 8, 64289 Darmstadt
>>> +49 (0)6151 16-76280
>>> .......................................
>>> 
>>> *** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
>>> *** Archive: http://www.eprints.org/tech.php/
>>> *** EPrints community wiki: http://wiki.eprints.org/
>>> *** EPrints developers Forum: http://forum.eprints.org/
>>> 
>>> *** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
>>> *** Archive: http://www.eprints.org/tech.php/
>>> *** EPrints community wiki: http://wiki.eprints.org/
>>> *** EPrints developers Forum: http://forum.eprints.org/
>>> 
>>> *** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
>>> *** Archive: http://www.eprints.org/tech.php/
>>> *** EPrints community wiki: http://wiki.eprints.org/
>>> *** EPrints developers Forum: http://forum.eprints.org/
>> 
>> 
>> *** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
>> *** Archive: http://www.eprints.org/tech.php/
>> *** EPrints community wiki: http://wiki.eprints.org/
>> *** EPrints developers Forum: http://forum.eprints.org/
>> 
>> *** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
>> *** Archive: http://www.eprints.org/tech.php/
>> *** EPrints community wiki: http://wiki.eprints.org/
>> *** EPrints developers Forum: http://forum.eprints.org/
> 
> 
> *** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
> *** Archive: http://www.eprints.org/tech.php/
> *** EPrints community wiki: http://wiki.eprints.org/
> *** EPrints developers Forum: http://forum.eprints.org/
> 
> *** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
> *** Archive: http://www.eprints.org/tech.php/
> *** EPrints community wiki: http://wiki.eprints.org/
> *** EPrints developers Forum: http://forum.eprints.org/