EPrints Technical Mailing List Archive

Message: #05933

< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First

Re: [EP-tech] Problem depositing larger documents via SWORD 2.0

Hi Willem,

I’m not using eprints_wrapper as such, but a similar homemade process in PHP using base64_encode and the PHPcurl library, to push files to the SWORD 2.0 portal on eprints.  I just tested with a 5MB zip file and the encoding and upload took about 4s.  I don’t know offhand the spec of the virtual server it is running on, but I think it has 2GB RAM, running SUSE linux.  Likewise I’m unsure of the spec at the eprints end, but it’s also a VM.


However it crashed on a 26MB file.  I tried again with 3 x 8mb files and it worked fine, in about 10s.


Not sure if this helps, but it does suggest that base64 processing is not a problem in itself, time-wise, with average hardware at either end.  The only obvious difference I can spot is that mine uses chunk_split to break up the base64 into lines, but how I arrived at that I can’t remember.  Might be worth a try, works for me.





======================= Base64 encoding fragment ===========================


while ($f = mysql_fetch_array($files_result)) { #build file metadata and base64 data


                                $filename = $f['file_oaManuscript'];

$filenamesafe= htmlspecialchars($filename );  #Who puts ampersands in filenames!!

                                $mimetype = $f['file_oaManuscript_mimetype'];




                                if(FALSE === ($STUFF=file_get_contents($filebase.$filename))){die("\n\nfailed to get file: $filebase$filename");}



                                $filesize = strlen($STUFF);

                                $file_modified= $f['modified_oaManuscript'];




$filesXML = "










                                                                <filesize>$filesize </filesize>



                                                                <data encoding='base64'>";


$filesXML .= $base64;


$filesXML .= "</data>



==========CURL FRAGMENT=========================================================================================================



curl_setopt($ch, CURLOPT_URL, "http://researchonline.lshtm.ac.uk/id/contents");

curl_setopt($ch, CURLOPT_HEADER, 1);




$pkgheader=Array('X-Packaging: http://eprints.org/ep2/data/2.0',

                 'Content-Type: text/xml',

                 'Metadata-Relevant: true',

                 'X-Verbose: true' ,

                 'In-Progress: false'); # TRUE => user inbox;  FALSE => review              




$html_in="http://pubdb.lshtm.ac.uk/publications/OAmgr/OAmgr_upload/eprints_xml.php?filter=oaPub_ID&value=$oaPub_ID";  #fetches eprints XML


curl_setopt($ch, CURLOPT_POST,1);

curl_setopt($ch, CURLOPT_POSTFIELDS, $data);




($result=curl_exec($ch) )|| die( "curl_exec failed: ". curl_error($ch));







From: eprints-tech-bounces@ecs.soton.ac.uk [mailto:eprints-tech-bounces@ecs.soton.ac.uk] On Behalf Of John Salter
Sent: 15 September 2016 11:25
To: eprints-tech@ecs.soton.ac.uk
Subject: Re: [EP-tech] Problem depositing larger documents via SWORD 2.0


Hi Willem,

I’ve had a quick look at the php code.

It’s base64 encoding the file, and adding it to the EPrintsXML it generates in a <document> element.


The encoding (and decoding at the other end) takes some time – and is probably not the correct process for larger files.


This is the process that I think *should* be used in this scenario:


but I’m not sure if the EPrintsWrapper class can do this…


Others on this list have more SWORD experience than me – hopefully someone will be able to provide a bit more advice.






From: eprints-tech-bounces@ecs.soton.ac.uk [mailto:eprints-tech-bounces@ecs.soton.ac.uk] On Behalf Of W. Struiksma
Sent: 14 September 2016 14:13
To: eprints-tech@ecs.soton.ac.uk
Subject: [EP-tech] Problem depositing larger documents via SWORD 2.0


Hi all,


I'm currently having problems depositing larger documents (> 5 MB) via SWORD 2.0. I'm using a PHP script that uses EPrintsWrapper.php. In this script the EPrints XML (including document) is posted via cURL.



The deposit takes a very long time (8 minutes for 26 MB) and the Apache process goes to a 100% processor capacity.


Has anyone experienced the same behaviour before? What can I do about it?


We use EPrints 3.3.13.

Thanks in advance!

Willem Struiksma
University of Groningen