[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[EP-tech] Problem depositing larger documents via SWORD 2.0



Hello Andy and John,

We are working on a solution for our problem and I want to thank you for 
your tips!

Sincerely,
Willem Struiksma

------ Original Message ------
From: "Andy Reid" <Andy.REID at lshtm.ac.uk>
To: "eprints-tech at ecs.soton.ac.uk" <eprints-tech at ecs.soton.ac.uk>
Sent: 15-9-2016 14:51:33
Subject: Re: [EP-tech] Problem depositing larger documents via SWORD 2.0

>Hi Willem,
>
>I?m not using eprints_wrapper as such, but a similar homemade process 
>in PHP using base64_encode and the PHPcurl library, to push files to 
>the SWORD 2.0 portal on eprints.  I just tested with a 5MB zip file and 
>the encoding and upload took about 4s.  I don?t know offhand the spec 
>of the virtual server it is running on, but I think it has 2GB RAM, 
>running SUSE linux.  Likewise I?m unsure of the spec at the eprints 
>end, but it?s also a VM.
>
>
>
>However it crashed on a 26MB file.  I tried again with 3 x 8mb files 
>and it worked fine, in about 10s.
>
>
>
>Not sure if this helps, but it does suggest that base64 processing is 
>not a problem in itself, time-wise, with average hardware at either 
>end.  The only obvious difference I can spot is that mine uses 
>chunk_split to break up the base64 into lines, but how I arrived at 
>that I can?t remember.  Might be worth a try, works for me.
>
>
>
>
>
>Andy
>
>
>
>======================= Base64 encoding fragment 
>===========================
>
>
>
>while ($f = mysql_fetch_array($files_result)) { #build file metadata 
>and base64 data
>
>                         $filenum++;
>
>                                 $filename = $f['file_oaManuscript'];
>
>$filenamesafe= htmlspecialchars($filename );  #Who puts ampersands in 
>filenames!!
>
>                                 $mimetype = 
>$f['file_oaManuscript_mimetype'];
>
>
>
>                                 $maintype=$mimetype;
>
>$mainfile=$filenamesafe;
>
>                                 if(FALSE === 
>($STUFF=file_get_contents($filebase.$filename))){die("\n\nfailed to get 
>file: $filebase$filename");}
>
>                                 
>$base64=chunk_split(base64_encode($STUFF));
>
>                                 $hash=md5($base64);
>
>                                 $filesize = strlen($STUFF);
>
>                                 $file_modified= 
>$f['modified_oaManuscript'];
>
>
>
>
>
>
>
>$filesXML = "
>
>
>
>                                                  <file>
>
>
>
>                                                                 
><datasetid>document</datasetid>
>
>
>
>                                                                 
><filename>$filenamesafe</filename>
>
>                                                                 
><mime_type>$mimetype</mime_type>
>
>                                                                 
><hash>$hash</hash>
>
>                                                                 
><hash_type>MD5</hash_type>
>
>                                                                 
><filesize>$filesize </filesize>
>
>                                                                 
><mtime>$file_modified</mtime>
>
>
>
>                                                                 <data 
>encoding='base64'>";
>
>
>
>$filesXML .= $base64;
>
>
>
>$filesXML .= "</data>
>
>                 </file>";
>
>
>
>==========CURL 
>FRAGMENT=========================================================================================================
>
>
>
>
>
>curl_setopt($ch, CURLOPT_URL, 
>"http://researchonline.lshtm.ac.uk/id/contents";);
>
>curl_setopt($ch, CURLOPT_HEADER, 1);
>
>
>
>
>
>
>
>$pkgheader=Array('X-Packaging: http://eprints.org/ep2/data/2.0',
>
>                  'Content-Type: text/xml',
>
>                  'Metadata-Relevant: true',
>
>                  'X-Verbose: true' ,
>
>                  'In-Progress: false'); # TRUE => user inbox;  FALSE => 
>review
>
>curl_setopt($ch,CURLOPT_HTTPHEADER,$pkgheader);
>
>
>
>
>
>$html_in="http://pubdb.lshtm.ac.uk/publications/OAmgr/OAmgr_upload/eprints_xml.php?filter=oaPub_ID&value=$oaPub_ID";; 
>  #fetches eprints XML
>
>$data=file_get_contents($html_in);
>
>curl_setopt($ch, CURLOPT_POST,1);
>
>curl_setopt($ch, CURLOPT_POSTFIELDS, $data);
>
>
>
>curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
>
>
>
>($result=curl_exec($ch) )|| die( "curl_exec failed: ". 
>curl_error($ch));
>
>
>
>
>
>
>
>
>
>
>
>
>
>From:eprints-tech-bounces at ecs.soton.ac.uk 
>[mailto:eprints-tech-bounces at ecs.soton.ac.uk] On Behalf Of John Salter
>Sent: 15 September 2016 11:25
>To:eprints-tech at ecs.soton.ac.uk
>Subject: Re: [EP-tech] Problem depositing larger documents via SWORD 
>2.0
>
>
>
>Hi Willem,
>
>I?ve had a quick look at the php code.
>
>It?s base64 encoding the file, and adding it to the EPrintsXML it 
>generates in a <document> element.
>
>
>
>The encoding (and decoding at the other end) takes some time ? and is 
>probably not the correct process for larger files.
>
>
>
>This is the process that I think *should* be used in this scenario:
>
>http://swordapp.github.io/SWORDv2-Profile/SWORDProfile.html#protocoloperations_creatingresource_multipart
>
>but I?m not sure if the EPrintsWrapper class can do this?
>
>
>
>Others on this list have more SWORD experience than me ? hopefully 
>someone will be able to provide a bit more advice.
>
>
>
>Cheers,
>
>John
>
>
>
>
>
>From:eprints-tech-bounces at ecs.soton.ac.uk 
>[mailto:eprints-tech-bounces at ecs.soton.ac.uk] On Behalf Of W. Struiksma
>Sent: 14 September 2016 14:13
>To:eprints-tech at ecs.soton.ac.uk
>Subject: [EP-tech] Problem depositing larger documents via SWORD 2.0
>
>
>
>Hi all,
>
>
>
>I'm currently having problems depositing larger documents (> 5 MB) via 
>SWORD 2.0. I'm using a PHP script that uses EPrintsWrapper.php. In this 
>script the EPrints XML (including document) is posted via cURL.
>
>
>
>https://github.com/davidfkane/eprintsDepositHelper/blob/master/EPrintsWrapper.php
>
>
>
>The deposit takes a very long time (8 minutes for 26 MB) and the Apache 
>process goes to a 100% processor capacity.
>
>
>
>Has anyone experienced the same behaviour before? What can I do about 
>it?
>
>
>
>We use EPrints 3.3.13.
>
>Thanks in advance!
>
>Sincerely,
>Willem Struiksma
>University of Groningen
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ecs.soton.ac.uk/pipermail/eprints-tech/attachments/20160916/77c59d75/attachment-0001.html