EPrints Technical Mailing List Archive

See the EPrints wiki for instructions on how to join this mailing list and related information.

Message: #09659


< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First

Re: [EP-tech] Sword 2.0 API upload times


CAUTION: This e-mail originated outside the University of Southampton.

Maybe pdf indexing or pdf cover can be the problem? But the process should be asyncronous via queue, right?

Il 01/03/24 17:29, David R Newman ha scritto:
Hi Martin,

I just tried uploading a 100MB using the CRUD API and this seemed to take only a few seconds (to my dev VM running EPrints 3.4 GitHub HEAD):

time curl -X POST -i -u USERNAME:PASSWORD --data-binary "@100MB.txt" -H 'Content-Disposition: attachment; filename="100MB.txt"' -H "Content-Type: text/plain" https://eprints.example.org/id/eprint/1234/contents
real    0m6.119s
user    0m0.279s
sys     0m0.278s


I confirmed that the file had uploaded successfully and downloaded it to confirm it was of the expected size.

I am not sure if there would be something within the SWORD API beyond that would do beyond what is in this Curl request, is the uploaded file a zip that needs to be unpacked?

Regards

David Newman

On 01/03/2024 3:48 pm, Martin Brändle wrote:
CAUTION: This e-mail originated outside the University of Southampton.
CAUTION: This e-mail originated outside the University of Southampton.

Dear all,

 

since very recently, one faculty of our university deposits its dissertations via Sword 2.0 API. The EP3XML with embedded PDF is deposited.

 

Everything works fine, however, the faculty observes that it takes unproportionally long the bigger the size of the PDF is, until they get process termination feedback:

 

  • 3.8 MB: 7 seconds
  • 16.5MB: 2min 30s
  • 22.6MB: 4min

 

Is such a behaviour known to you? Any adjusting screws?

 

We do some checks such as scanning for viruses or format determination using Droid. The former is done immediately in the document_validate.pl, the latter is being triggered after the document has been uploaded. So I don’t see any bottleneck in these processes.

 

Kind regards,

 

Martin

 

--

Dr. Martin Brändle
Zentrale Informatik
Universität Zürich
Pfingstweidstrasse 60B
CH-800
5 Zürich

 


*** Options: https://wiki.eprints.org/w/Eprints-tech_Mailing_List
*** Archive: https://www.eprints.org/tech.php/
*** EPrints community wiki: https://wiki.eprints.org/



*** Options: https://wiki.eprints.org/w/Eprints-tech_Mailing_List
*** Archive: https://www.eprints.org/tech.php/
*** EPrints community wiki: https://wiki.eprints.org/