EPrints Technical Mailing List Archive

See the EPrints wiki for instructions on how to join this mailing list and related information.

Message: #09653


< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First

Re: [EP-tech] 0 byte file uploads


CAUTION: This e-mail originated outside the University of Southampton.

Dear all,

 

thank you. I backported David’s https://github.com/eprints/eprints3.4/commit/f03b80da02b319d59705144ecccdc933b91c99e5 fix to our EPrints version, it works like a charm.

 

Kind regards,

 

Martin

 

 

From: eprints-tech-request@ecs.soton.ac.uk <eprints-tech-request@ecs.soton.ac.uk> on behalf of John Salter <J.Salter@leeds.ac.uk>
Date: Friday, 1 March 2024 at 12:01
To: eprints-tech@ecs.soton.ac.uk <eprints-tech@ecs.soton.ac.uk>
Subject: RE: [EP-tech] 0 byte file uploads

CAUTION: This e-mail originated outside the University of Southampton.

CAUTION: This e-mail originated outside the University of Southampton.

Hi,

My check (in document_validate.pl) looks at the filesize stored in the DB, which from my experience has been accurate:

 

foreach my $file (@{($document->get_value( "files" ))})

{

    if( $file->get_value( 'filesize' ) == 0 ){

        push @problems, $repository->html_phrase(

            "validate:document:zero_length_file",

            filename=> $file->render_value( "filename" ),

            fieldname => $xml->create_element( "span", class=>"ep_problem_field:documents" ),

        );

  }

}

 

Cheers,

John

 

 

From: eprints-tech-request@ecs.soton.ac.uk <eprints-tech-request@ecs.soton.ac.uk> On Behalf Of Liam Green-Hughes
Sent: Friday, March 1, 2024 9:41 AM
To: eprints-tech@ecs.soton.ac.uk
Subject: Re: [EP-tech] 0 byte file uploads

 

CAUTION: External Message. Use caution opening links and attachments.

CAUTION: This e-mail originated outside the University of Southampton.

CAUTION: This e-mail originated outside the University of Southampton.

Hi everyone,

 

Currently we have this check in document_validate.pl:

 

 my %files = $document->files;

 foreach my $file (keys %files) {

     my $source = $document->local_path."/".$file;          

      # If file is not on the filesystem it could be a potential zero length file

      if ( ! -e $source )

     {

          push @problems, $session->html_phrase("validate:possible_zero_length_file", filename=>$session->make_text($file));

          next;

      }

 }

 

.. and this is the phrase:

<epp:phrase id="validate:possible_zero_length_file">Please check the file: <epc:pin name="filename"/>. Files that have a size of zero cannot be uploaded to KAR.</epp:phrase>

 

Not perfect but catches zero length uploads at least.

 

Thanks

Liam


From: eprints-tech-request@ecs.soton.ac.uk <eprints-tech-request@ecs.soton.ac.uk> on behalf of Tomasz Neugebauer <Tomasz.Neugebauer@concordia.ca>
Sent: 01 March 2024 01:32
To: Liam Green-Hughes <
L.E.Green-Hughes@kent.ac.uk>; eprints-tech@ecs.soton.ac.uk <eprints-tech@ecs.soton.ac.uk>
Subject: Re: [EP-tech] 0 byte file uploads

 

CAUTION: This email originated from outside of the organisation. Do not click links or open attachments unless you recognise the sender and know the content is safe.

 

CAUTION: This e-mail originated outside the University of Southampton.

CAUTION: This e-mail originated outside the University of Southampton.

Hi everyone,

 

We discovered a few of these as well when we were processing everything we had for digital preservation with Archivematica.  On my to-do list for the Archivematica plugin is to add an error about this, on export, because Archivematica isn't necessarily dealing well with these 0 size files either.  Ideally, though, the error would be flagged immediately to the uploader/depositor.  In summary, I am following this thread with much interest. 

 

Tomasz

 

 


From: eprints-tech-request@ecs.soton.ac.uk <eprints-tech-request@ecs.soton.ac.uk> on behalf of Martin Brändle <martin.braendle@uzh.ch>
Sent: Wednesday, February 28, 2024 6:54 AM
To: Liam Green-Hughes <
L.E.Green-Hughes@kent.ac.uk>; eprints-tech@ecs.soton.ac.uk <eprints-tech@ecs.soton.ac.uk>
Subject: Re: [EP-tech] 0 byte file uploads

 

Attention This email originates from outside the concordia.ca domain. // Ce courriel provient de l'extérieur du domaine de concordia.ca

 

 

CAUTION: This e-mail originated outside the University of Southampton.

CAUTION: This e-mail originated outside the University of Southampton.

Dear all,

 

thanks for your pointers (the hypothesis by John on cloud storage being interesting) . I’ll follow them up and implement those.

Another hypothesis that came up during our Scrum meeting was that the upload problems occur more frequently since our university switched to another VPN software.

We have EPrints 3.3.16 . We had disabled upload by URL years ago because of known problems.

Kind regards,

 

Martin

 

--

Dr. Martin Brändle
Zentrale Informatik
Universität Zürich
Pfingstweidstrasse 60B
CH-800
5 Zürich

 

 

From: Liam Green-Hughes <L.E.Green-Hughes@kent.ac.uk>
Date: Wednesday, 28 February 2024 at 10:59
To:
eprints-tech@ecs.soton.ac.uk <eprints-tech@ecs.soton.ac.uk>, Martin Brändle <martin.braendle@uzh.ch>
Subject: Re: [EP-tech] 0 byte file uploads

HI all,

 

We have seen this issue too. I believe it is to do with a file entry being created in the database, but nothing copied to the actual document filesystem. If you create a file of zero length (e.g. with the touch command) and try to upload it to an Eprints instance you should be able to see this in action (from memory). In our repository I added a warning message into document_validate.pl. Not sure how people end up with zero length files, it could be something to do with PDF file generation or Eprints getting upset a invalid characters in filenames (it doesn't like apostrophes much). 

 

Thanks

Liam

 

 


From: eprints-tech-request@ecs.soton.ac.uk <eprints-tech-request@ecs.soton.ac.uk> on behalf of John Salter <J.Salter@leeds.ac.uk>
Sent: 28 February 2024 09:20
To: 
eprints-tech@ecs.soton.ac.uk <eprints-tech@ecs.soton.ac.uk>; Martin Brändle <martin.braendle@uzh.ch>
Subject: RE: [EP-tech] 0 byte file uploads

 

Some people who received this message don't often get email from j.salter@leeds.ac.uk. Learn why this is important

CAUTION: This email originated from outside of the organisation. Do not click links or open attachments unless you recognise the sender and know the content is safe.

 

CAUTION: This e-mail originated outside the University of Southampton.

CAUTION: This e-mail originated outside the University of Southampton.

Hi Martin, David,

I have observed this too in 3.3.16 – and written a cronjob to alert me to new cases in case there was a pattern.

 

My hunch was that it related to people uploading from cloud storage – where a file appears as though it’s local to the user’s computer, but the files aren’t actually cached locally. As yet, I haven’t managed to get a proper failing test case.

 

I have put some warnings in place as part of my document_validate to catch these – although it sounds like these will not be needed when we upgrade.

 

Cheers,

John

 

From: eprints-tech-request@ecs.soton.ac.uk <eprints-tech-request@ecs.soton.ac.uk> On Behalf Of David R Newman
Sent: Wednesday, February 28, 2024 9:06 AM
To: eprints-tech@ecs.soton.ac.uk; Martin Brändle <martin.braendle@uzh.ch>
Subject: Re: [EP-tech] 0 byte file uploads

 

CAUTION: External Message. Use caution opening links and attachments.

Hi Martin,

I am aware of this issue and we believe it many cases we think it is down to how the _javascript_ works in the uploader, mainly we believe with drag-n-drop.  My colleague has rewritten this, as modern web browser no longer need the _javascript_ currently used and we attend to add it to the next major release of EPrints.

I did implement something to warn if there is a file that reports as zero bytes (i.e. the document file's filesize is 0.  It is in the second commit for:

https://github.com/eprints/eprints3.4/issues/189 (changeset: https://github.com/eprints/eprints3.4/commit/f03b80da02b319d59705144ecccdc933b91c99e5)

This GitHub issue was admittedly originally focussed on what I believed was another reason behind zero-byte files.  That a user would try to upload from a URL they had access to but the EPrints repository did not (e.g. private IP or site that required password or similar authentication).  However, the second commit was solely focussed on putting up a warning message after the upload if this failed to complete successfully.  This was implemented in EPrints 3.4.4, which version of EPrints are you running?  It would be useful to know if it works as expected for you, as this is such an intermittent issue it is has been difficult to test.  However, it should warn if the filesize for one of a document's file is 0.  Unfortunately, it may not do this as soon as the upload fails but at very least this should appear in the same place as non-field specific warnings (e.g. a bespoke validation that requires field field A or filed B to be set), so should be picked up before the user clicks deposit or otherwise during the review process.

Regards

David Newman

On 28/02/2024 8:40 am, Martin Brändle wrote:

CAUTION: This e-mail originated outside the University of Southampton.

CAUTION: This e-mail originated outside the University of Southampton.

Dear all,

 

in our repository, we have found a few PDFs that are 0 bytes long (actually, it’s a 0.05 per mille problem).

We are not sure how this has happened – we don’t think that there are problems with the drive (it’s mirrored), rather we think that the problem originates from the user’s side, e.g. that something happened at upload or the file was already faulty on the user’s drive.

 

Indeed, it’s possible to upload a file with 0 bytes length to EPrints without any problem as we had tested.

 

However, I think this should be checked by the file uploader and a warning should be issued to the user. This seems not to be implemented yet.

 

Kind regards,

 

Martin

 

--

Dr. Martin Brändle
Zentrale Informatik
Universität Zürich
Pfingstweidstrasse 60B
CH-8005 Zürich

 

 

*** EPrints community wiki: https://wiki.eprints.org/