[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[EP-tech] Problem depositing larger documents via SWORD 2.0



Hi Willem,
I?m not using eprints_wrapper as such, but a similar homemade process in PHP using base64_encode and the PHPcurl library, to push files to the SWORD 2.0 portal on eprints.  I just tested with a 5MB zip file and the encoding and upload took about 4s.  I don?t know offhand the spec of the virtual server it is running on, but I think it has 2GB RAM, running SUSE linux.  Likewise I?m unsure of the spec at the eprints end, but it?s also a VM.

However it crashed on a 26MB file.  I tried again with 3 x 8mb files and it worked fine, in about 10s.

Not sure if this helps, but it does suggest that base64 processing is not a problem in itself, time-wise, with average hardware at either end.  The only obvious difference I can spot is that mine uses chunk_split to break up the base64 into lines, but how I arrived at that I can?t remember.  Might be worth a try, works for me.


Andy

======================= Base64 encoding fragment ===========================

while ($f = mysql_fetch_array($files_result)) { #build file metadata and base64 data
                        $filenum++;
                                $filename = $f['file_oaManuscript'];
$filenamesafe= htmlspecialchars($filename );  #Who puts ampersands in filenames!!
                                $mimetype = $f['file_oaManuscript_mimetype'];

                                $maintype=$mimetype;
$mainfile=$filenamesafe;
                                if(FALSE === ($STUFF=file_get_contents($filebase.$filename))){die("\n\nfailed to get file: $filebase$filename");}
                                $base64=chunk_split(base64_encode($STUFF));
                                $hash=md5($base64);
                                $filesize = strlen($STUFF);
                                $file_modified= $f['modified_oaManuscript'];



$filesXML = "

                                                 <file>

                                                                <datasetid>document</datasetid>

                                                                <filename>$filenamesafe</filename>
                                                                <mime_type>$mimetype</mime_type>
                                                                <hash>$hash</hash>
                                                                <hash_type>MD5</hash_type>
                                                                <filesize>$filesize </filesize>
                                                                <mtime>$file_modified</mtime>

                                                                <data encoding='base64'>";

$filesXML .= $base64;

$filesXML .= "</data>
                </file>";

==========CURL FRAGMENT=========================================================================================================


curl_setopt($ch, CURLOPT_URL, "http://researchonline.lshtm.ac.uk/id/contents";);
curl_setopt($ch, CURLOPT_HEADER, 1);



$pkgheader=Array('X-Packaging: http://eprints.org/ep2/data/2.0',
                 'Content-Type: text/xml',
                 'Metadata-Relevant: true',
                 'X-Verbose: true' ,
                 'In-Progress: false'); # TRUE => user inbox;  FALSE => review
curl_setopt($ch,CURLOPT_HTTPHEADER,$pkgheader);


$html_in="http://pubdb.lshtm.ac.uk/publications/OAmgr/OAmgr_upload/eprints_xml.php?filter=oaPub_ID&value=$oaPub_ID";;  #fetches eprints XML
$data=file_get_contents($html_in);
curl_setopt($ch, CURLOPT_POST,1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $data);

curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);

($result=curl_exec($ch) )|| die( "curl_exec failed: ". curl_error($ch));






From: eprints-tech-bounces at ecs.soton.ac.uk [mailto:eprints-tech-bounces at ecs.soton.ac.uk] On Behalf Of John Salter
Sent: 15 September 2016 11:25
To: eprints-tech at ecs.soton.ac.uk
Subject: Re: [EP-tech] Problem depositing larger documents via SWORD 2.0

Hi Willem,
I?ve had a quick look at the php code.
It?s base64 encoding the file, and adding it to the EPrintsXML it generates in a <document> element.

The encoding (and decoding at the other end) takes some time ? and is probably not the correct process for larger files.

This is the process that I think *should* be used in this scenario:
http://swordapp.github.io/SWORDv2-Profile/SWORDProfile.html#protocoloperations_creatingresource_multipart
but I?m not sure if the EPrintsWrapper class can do this?

Others on this list have more SWORD experience than me ? hopefully someone will be able to provide a bit more advice.

Cheers,
John


From: eprints-tech-bounces at ecs.soton.ac.uk<mailto:eprints-tech-bounces at ecs.soton.ac.uk> [mailto:eprints-tech-bounces at ecs.soton.ac.uk] On Behalf Of W. Struiksma
Sent: 14 September 2016 14:13
To: eprints-tech at ecs.soton.ac.uk<mailto:eprints-tech at ecs.soton.ac.uk>
Subject: [EP-tech] Problem depositing larger documents via SWORD 2.0

Hi all,

I'm currently having problems depositing larger documents (> 5 MB) via SWORD 2.0. I'm using a PHP script that uses EPrintsWrapper.php. In this script the EPrints XML (including document) is posted via cURL.

https://github.com/davidfkane/eprintsDepositHelper/blob/master/EPrintsWrapper.php

The deposit takes a very long time (8 minutes for 26 MB) and the Apache process goes to a 100% processor capacity.

Has anyone experienced the same behaviour before? What can I do about it?

We use EPrints 3.3.13.

Thanks in advance!

Sincerely,
Willem Struiksma
University of Groningen
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ecs.soton.ac.uk/pipermail/eprints-tech/attachments/20160915/1e57a4e1/attachment-0001.html