EPrints Technical Mailing List Archive

See the EPrints wiki for instructions on how to join this mailing list and related information.

Message: #09658


< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First

RE: [EP-tech] Sword 2.0 API upload times


CAUTION: This e-mail originated outside the University of Southampton.

Martin,
Apache normally stores incoming files uploads on the VM somewhere like /var/tmp/ .

Are there any space/performance issues with this location on your setup? Can you check IO metrics of the device whilst a large upload is taking place?

 

Cheers,

John

 

 

From: eprints-tech-request@ecs.soton.ac.uk <eprints-tech-request@ecs.soton.ac.uk> On Behalf Of David R Newman
Sent: Friday, March 1, 2024 4:30 PM
To: eprints-tech@ecs.soton.ac.uk; Martin Brändle <martin.braendle@uzh.ch>
Subject: Re: [EP-tech] Sword 2.0 API upload times

 

CAUTION: External Message. Use caution opening links and attachments.

Hi Martin,

I just tried uploading a 100MB using the CRUD API and this seemed to take only a few seconds (to my dev VM running EPrints 3.4 GitHub HEAD):

time curl -X POST -i -u USERNAME:PASSWORD --data-binary "@100MB.txt" -H 'Content-Disposition: attachment; filename="100MB.txt"' -H "Content-Type: text/plain" https://eprints.example.org/id/eprint/1234/contents
real    0m6.119s
user    0m0.279s
sys     0m0.278s


I confirmed that the file had uploaded successfully and downloaded it to confirm it was of the expected size.

I am not sure if there would be something within the SWORD API beyond that would do beyond what is in this Curl request, is the uploaded file a zip that needs to be unpacked?

Regards

David Newman

On 01/03/2024 3:48 pm, Martin Brändle wrote:

CAUTION: This e-mail originated outside the University of Southampton.

CAUTION: This e-mail originated outside the University of Southampton.

Dear all,

 

since very recently, one faculty of our university deposits its dissertations via Sword 2.0 API. The EP3XML with embedded PDF is deposited.

 

Everything works fine, however, the faculty observes that it takes unproportionally long the bigger the size of the PDF is, until they get process termination feedback:

 

  1. 3.8 MB: 7 seconds
  1. 16.5MB: 2min 30s
  1. 22.6MB: 4min

 

Is such a behaviour known to you? Any adjusting screws?

 

We do some checks such as scanning for viruses or format determination using Droid. The former is done immediately in the document_validate.pl, the latter is being triggered after the document has been uploaded. So I don’t see any bottleneck in these processes.

 

Kind regards,

 

Martin

 

--

Dr. Martin Brändle
Zentrale Informatik
Universität Zürich
Pfingstweidstrasse 60B
CH-800
5 Zürich


 



*** Options: https://wiki.eprints.org/w/Eprints-tech_Mailing_List
*** Archive: https://www.eprints.org/tech.php/
*** EPrints community wiki: https://wiki.eprints.org/