EPrints Technical Mailing List Archive

Message: #03496

< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First

[EP-tech] Re: Coversheets - anyone involved with the development still around?

Hi John,

I think that the point the metadata would get lost is when the covering pdf file is stitched to the original pdf, generating a covered version.  In the latest coversheet bazaar package uses ghostscript(gs) as the stitching program, which is not preserving the metadata.

Pdftk (https://www.pdflabs.com/docs/install-pdftk-on-redhat-or-centos/), which is a more comprehensive program that may have option to preserve the metadata, was used previously until a serious bug caused it to loop/never finish processing the stitching process for some types of pdf files. The bug was believed to be fixed in a more recent version of pdftk, but it has not been fully tested in eprints yet. 

It is probably worth to try to install the latest version of pdftk and ask eprints to use it as the stitching program.

To switch to use pdftk in eprints
/opt/eprints3/lib/plugins/EPrints/Plugin/Convert/AddCoversheet.pm line @ 215
        my $temp_output_dir = File::Temp->newdir( "ep-coversheet-finishedXXXX", TMPDIR => 1 );
        my $temp_output_file = $temp_dir.'/temp.pdf';

##switch back using pdftk 
        my $pdftk = $plugin->get_repository->get_conf( "executables", "pdftk" );
        system( $pdftk, @input_files, "cat", "output", $temp_output_file );

        copy($temp_output_file, $output_file);

        # check it worked
        unless( -e $output_file && -s $output_file ) #check files exists and is not zero length
                $repository->log("[Convert::AddCoversheet] pdftk could not create '$output_file'. Check the PDF is not password-protected.");

=begin GHOST
        # EPrints Services/pjw Modification to use Ghostscript rather than pdftk
        my $gs_cmd = $plugin->get_repository->get_conf( "gs_pdf_stich_cmd" );
        # add the output file
        $gs_cmd .= $temp_output_file;
        # add the input files
        foreach my $input_file (@input_files)
                $gs_cmd .= " '$input_file'";

        my $sys_call_status = system($gs_cmd);
        # check it worked
        if (0 == $sys_call_status)
                copy($temp_output_file, $output_file);
                my $eprint = $doc->get_eprint;
#                       $repository->mail_administrator( 'Plugin/Screen/Coversheet:email_subject',
#                                                 'Plugin/Screen/Coversheet:email_body',
#                                                 eprintid => $eprint->render_value("eprintid"),
#                                                 docid => $doc->render_value("docid") );

                $repository->log("[Convert::AddCoversheet] Ghostscript could not create '$output_file'. Check the PDF is not password-protected.");
=end GHOST

        EPrints::Utils::chown_for_eprints( $output_file );

in archive/[repoid]/cfg/cfg.d/z_coversheet.pl  @line 30
##Add pdftk executable path:
$c->{executables}->{pdftk} = "/usr/bin/pdftk";


On 23/09/2014 12:25, John Salter wrote:
Does anyone around here have anything to do with the 'Coverpages' bazaar package?
In particular I'm looking at the metadata associated with the resulting (covered) PDF. On the UKCoRR mailing list, there was a claim that some work had been done so that metadata that existed in the original PDF wasn't affected?


*** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
*** Archive: http://www.eprints.org/tech.php/
*** EPrints community wiki: http://wiki.eprints.org/
*** EPrints developers Forum: http://forum.eprints.org/

Jiadi Yao
EPrints Services
3081, Building 32
University of Southampton