EPrints Technical Mailing List Archive

See the EPrints wiki for instructions on how to join this mailing list and related information.

Message: #09656


< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First

RE: [EP-tech] 0 byte file uploads


Hi all,

 

I’ve been tracking this issue for some time and I came to the conclusion that the drag and drop support was the cause of the zero-byte uploads as I checked all of the reported instances on a number of services and verified that drag and drop was used in all cases.

 

The current drag and drop code uses multiple stages to the EPrints REST API:

 

  1. Create the file
  2. Upload the file content 1MB at a time as separate API requests
  3. Mark the file upload as complete

 

I reproduced the zero-byte file issue by stopping the process after step 1. The time that I managed to reproduce this myself on Google Chrome, I saw a console message reporting that the form submission had been cancelled because it had been removed from the DOM. Since the above steps are done with recursion on the success of previous steps, this seems to me to be a possible root cause.

 

I fixed the issue by rewriting the drag and drop code entirely to use XMLHttpRequest instead which uses the same method as the regular workflow update.

 

This is tracked in https://github.com/eprints/eprints3.5/issues/43 which involves replacing one file with the following:

 

https://github.com/eprints/eprints3.5/blob/3921c954385aee9a1954141bdc8e0bf595d55477/lib/static/_javascript_/auto/88_uploadmethod_file.js

 

This change has the added benefit that it removes the ‘fakepath’ label that gets shown and it also uses the modern XMLHttpRequest feature to track upload progress client-side and so it doesn’t keep requesting upload progress from the EPrints server.

 

Regards,

Don.

 

From: eprints-tech-request@ecs.soton.ac.uk <eprints-tech-request@ecs.soton.ac.uk> On Behalf Of Will Fyson
Sent: Friday, March 1, 2024 3:09 PM
To: eprints-tech@ecs.soton.ac.uk
Subject: Re: [EP-tech] 0 byte file uploads

 

CAUTION: This e-mail originated outside the University of Southampton.

CAUTION: This e-mail originated outside the University of Southampton.

Hi all,

 

Just to confirm what David said in an earlier email, I think this can occur when depositors use the drag and drop interface, specifically where the upload process gets a bit out of sync with the document record creation process when uploading multiple files, and as a result some files drift apart from their respective document record and fail to upload correctly.

 

As such I applied a fix in the eprintsug core at https://github.com/eprintsug/ulcc-core/commit/c28c11ad3e5ffac72fe7b4f9d0874b3e93627770 which makes the drag and drop process recursive, ensuring we don't try and upload the next file until we've definitely finished creating the previous document. 

 

This was a fix to the _javascript_ drag and drop uploader and it sounds like a new more up to date approach is imminent in the next release, but I  just thought I'd bring it to people's attention in case it's useful!

 

Many thanks,

 

Will

 

Will Fyson

Development & Support Analyst, Research Technologies

CoSector, University of London

Senate House

Malet Street

London

WC1E 7HU

 

t: +44 (0)20 7863 1341

e: will.fyson@cosector.com

w: https://cosector.com/digital-research/

 

The University of London is an exempt charity in England and Wales.

 


From: eprints-tech-request@ecs.soton.ac.uk <eprints-tech-request@ecs.soton.ac.uk> on behalf of Martin Brändle <martin.braendle@uzh.ch>
Sent: 01 March 2024 14:55
To: eprints-tech@ecs.soton.ac.uk <eprints-tech@ecs.soton.ac.uk>
Subject: Re: [EP-tech] 0 byte file uploads

 

CAUTION: This e-mail originated outside the University of Southampton.

CAUTION: This e-mail originated outside the University of Southampton.

Dear all,

 

thank you. I backported David’s https://github.com/eprints/eprints3.4/commit/f03b80da02b319d59705144ecccdc933b91c99e5 fix to our EPrints version, it works like a charm.

 

Kind regards,

 

Martin

 

 

From: eprints-tech-request@ecs.soton.ac.uk <eprints-tech-request@ecs.soton.ac.uk> on behalf of John Salter <J.Salter@leeds.ac.uk>
Date: Friday, 1 March 2024 at 12:01
To: eprints-tech@ecs.soton.ac.uk <eprints-tech@ecs.soton.ac.uk>
Subject: RE: [EP-tech] 0 byte file uploads

CAUTION: This e-mail originated outside the University of Southampton.

CAUTION: This e-mail originated outside the University of Southampton.

Hi,

My check (in document_validate.pl) looks at the filesize stored in the DB, which from my experience has been accurate:

 

foreach my $file (@{($document->get_value( "files" ))})

{

    if( $file->get_value( 'filesize' ) == 0 ){

        push @problems, $repository->html_phrase(

            "validate:document:zero_length_file",

            filename=> $file->render_value( "filename" ),

            fieldname => $xml->create_element( "span", class=>"ep_problem_field:documents" ),

        );

  }

}

 

Cheers,

John

 

 

From: eprints-tech-request@ecs.soton.ac.uk <eprints-tech-request@ecs.soton.ac.uk> On Behalf Of Liam Green-Hughes
Sent: Friday, March 1, 2024 9:41 AM
To: eprints-tech@ecs.soton.ac.uk
Subject: Re: [EP-tech] 0 byte file uploads

 

CAUTION: External Message. Use caution opening links and attachments.

CAUTION: This e-mail originated outside the University of Southampton.

CAUTION: This e-mail originated outside the University of Southampton.

Hi everyone,

 

Currently we have this check in document_validate.pl:

 

 my %files = $document->files;

 foreach my $file (keys %files) {

     my $source = $document->local_path."/".$file;          

      # If file is not on the filesystem it could be a potential zero length file

      if ( ! -e $source )

     {

          push @problems, $session->html_phrase("validate:possible_zero_length_file", filename=>$session->make_text($file));

          next;

      }

 }

 

.. and this is the phrase:

<epp:phrase id="validate:possible_zero_length_file">Please check the file: <epc:pin name="filename"/>. Files that have a size of zero cannot be uploaded to KAR.</epp:phrase>

 

Not perfect but catches zero length uploads at least.

 

Thanks

Liam


From: eprints-tech-request@ecs.soton.ac.uk <eprints-tech-request@ecs.soton.ac.uk> on behalf of Tomasz Neugebauer <Tomasz.Neugebauer@concordia.ca>
Sent: 01 March 2024 01:32
To: Liam Green-Hughes <
L.E.Green-Hughes@kent.ac.uk>; eprints-tech@ecs.soton.ac.uk <eprints-tech@ecs.soton.ac.uk>
Subject: Re: [EP-tech] 0 byte file uploads

 

CAUTION: This email originated from outside of the organisation. Do not click links or open attachments unless you recognise the sender and know the content is safe.

 

CAUTION: This e-mail originated outside the University of Southampton.

CAUTION: This e-mail originated outside the University of Southampton.

Hi everyone,

 

We discovered a few of these as well when we were processing everything we had for digital preservation with Archivematica.  On my to-do list for the Archivematica plugin is to add an error about this, on export, because Archivematica isn't necessarily dealing well with these 0 size files either.  Ideally, though, the error would be flagged immediately to the uploader/depositor.  In summary, I am following this thread with much interest. 

 

Tomasz

 

 


From: eprints-tech-request@ecs.soton.ac.uk <eprints-tech-request@ecs.soton.ac.uk> on behalf of Martin Brändle <martin.braendle@uzh.ch>
Sent: Wednesday, February 28, 2024 6:54 AM
To: Liam Green-Hughes <
L.E.Green-Hughes@kent.ac.uk>; eprints-tech@ecs.soton.ac.uk <eprints-tech@ecs.soton.ac.uk>
Subject: Re: [EP-tech] 0 byte file uploads

 

Attention This email originates from outside the concordia.ca domain. // Ce courriel provient de l'extérieur du domaine de concordia.ca

 

 

CAUTION: This e-mail originated outside the University of Southampton.

CAUTION: This e-mail originated outside the University of Southampton.

Dear all,

 

thanks for your pointers (the hypothesis by John on cloud storage being interesting) . I’ll follow them up and implement those.

Another hypothesis that came up during our Scrum meeting was that the upload problems occur more frequently since our university switched to another VPN software.

We have EPrints 3.3.16 . We had disabled upload by URL years ago because of known problems.

Kind regards,

 

Martin

 

--

Dr. Martin Brändle
Zentrale Informatik
Universität Zürich
Pfingstweidstrasse 60B
CH-800
5 Zürich

 

 

From: Liam Green-Hughes <L.E.Green-Hughes@kent.ac.uk>
Date: Wednesday, 28 February 2024 at 10:59
To:
eprints-tech@ecs.soton.ac.uk <eprints-tech@ecs.soton.ac.uk>, Martin Brändle <martin.braendle@uzh.ch>
Subject: Re: [EP-tech] 0 byte file uploads

HI all,

 

We have seen this issue too. I believe it is to do with a file entry being created in the database, but nothing copied to the actual document filesystem. If you create a file of zero length (e.g. with the touch command) and try to upload it to an Eprints instance you should be able to see this in action (from memory). In our repository I added a warning message into document_validate.pl. Not sure how people end up with zero length files, it could be something to do with PDF file generation or Eprints getting upset a invalid characters in filenames (it doesn't like apostrophes much). 

 

Thanks

Liam

 

 


From: eprints-tech-request@ecs.soton.ac.uk <eprints-tech-request@ecs.soton.ac.uk> on behalf of John Salter <J.Salter@leeds.ac.uk>
Sent: 28 February 2024 09:20
To: 
eprints-tech@ecs.soton.ac.uk <eprints-tech@ecs.soton.ac.uk>; Martin Brändle <martin.braendle@uzh.ch>
Subject: RE: [EP-tech] 0 byte file uploads

 

Some people who received this message don't often get email from j.salter@leeds.ac.uk. Learn why this is important

CAUTION: This email originated from outside of the organisation. Do not click links or open attachments unless you recognise the sender and know the content is safe.

 

CAUTION: This e-mail originated outside the University of Southampton.

CAUTION: This e-mail originated outside the University of Southampton.

Hi Martin, David,

I have observed this too in 3.3.16 – and written a cronjob to alert me to new cases in case there was a pattern.

 

My hunch was that it related to people uploading from cloud storage – where a file appears as though it’s local to the user’s computer, but the files aren’t actually cached locally. As yet, I haven’t managed to get a proper failing test case.

 

I have put some warnings in place as part of my document_validate to catch these – although it sounds like these will not be needed when we upgrade.

 

Cheers,

John

 

From: eprints-tech-request@ecs.soton.ac.uk <eprints-tech-request@ecs.soton.ac.uk> On Behalf Of David R Newman
Sent: Wednesday, February 28, 2024 9:06 AM
To: eprints-tech@ecs.soton.ac.uk; Martin Brändle <martin.braendle@uzh.ch>
Subject: Re: [EP-tech] 0 byte file uploads

 

CAUTION: External Message. Use caution opening links and attachments.

Hi Martin,

I am aware of this issue and we believe it many cases we think it is down to how the _javascript_ works in the uploader, mainly we believe with drag-n-drop.  My colleague has rewritten this, as modern web browser no longer need the _javascript_ currently used and we attend to add it to the next major release of EPrints.

I did implement something to warn if there is a file that reports as zero bytes (i.e. the document file's filesize is 0.  It is in the second commit for:

https://github.com/eprints/eprints3.4/issues/189 (changeset: https://github.com/eprints/eprints3.4/commit/f03b80da02b319d59705144ecccdc933b91c99e5)

This GitHub issue was admittedly originally focussed on what I believed was another reason behind zero-byte files.  That a user would try to upload from a URL they had access to but the EPrints repository did not (e.g. private IP or site that required password or similar authentication).  However, the second commit was solely focussed on putting up a warning message after the upload if this failed to complete successfully.  This was implemented in EPrints 3.4.4, which version of EPrints are you running?  It would be useful to know if it works as expected for you, as this is such an intermittent issue it is has been difficult to test.  However, it should warn if the filesize for one of a document's file is 0.  Unfortunately, it may not do this as soon as the upload fails but at very least this should appear in the same place as non-field specific warnings (e.g. a bespoke validation that requires field field A or filed B to be set), so should be picked up before the user clicks deposit or otherwise during the review process.

Regards

David Newman

On 28/02/2024 8:40 am, Martin Brändle wrote:

CAUTION: This e-mail originated outside the University of Southampton.

CAUTION: This e-mail originated outside the University of Southampton.

Dear all,

 

in our repository, we have found a few PDFs that are 0 bytes long (actually, it’s a 0.05 per mille problem).

We are not sure how this has happened – we don’t think that there are problems with the drive (it’s mirrored), rather we think that the problem originates from the user’s side, e.g. that something happened at upload or the file was already faulty on the user’s drive.

 

Indeed, it’s possible to upload a file with 0 bytes length to EPrints without any problem as we had tested.

 

However, I think this should be checked by the file uploader and a warning should be issued to the user. This seems not to be implemented yet.

 

Kind regards,

 

Martin

 

--

Dr. Martin Brändle
Zentrale Informatik
Universität Zürich
Pfingstweidstrasse 60B
CH-8005 Zürich

 

 

*** EPrints community wiki: https://wiki.eprints.org/