EPrints Technical Mailing List Archive

Message: #00655


< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First

[EP-tech] Re: Base64 decoding in 3.3


Hi,

EPrints assumes a line-length of 77 (76 chars + LF).

%4 will break if the returned data happens to fall over a line break + 2
chars.

Here is hopefully a comprehensive fix:
http://trac.eprints.org/eprints/changeset/7764

Which ignores all whitespace and consumes modulo 4 chars for each chunk.


If you are talking to an EPrints instance that doesn't have this fix you
will need to format your Base64 into 76+LF lines. I would like to say
you should be doing this anyway, but I missed the CR that the spec
defines!:
http://en.wikipedia.org/wiki/Base64#Implementations_and_history
(So it ought to be 76+CR+LF i.e. modulo 78)


/Tim.

On Wed, 2012-05-30 at 14:06 +0100, James Colhoun wrote:
> Hi Tim,
> 
> 
> I have sent you the files. I have also been able to fix it I changed
> within File.pm I have changes "sub characters" FROM:
> 
> 
> print $tmpfile MIME::Base64::decode_base64( substr($_,0,length($_) -
> length($_)%77) );
> 
> 
> TO
> 
> 
> print $tmpfile MIME::Base64::decode_base64( substr($_,0,length($_) -
> length($_)%4) );
> 
> 
> this seem to stop the chunking from breaking up individual byes and
> causing the problem. I am still testing this but would be great to
> know what you think.
> 
> 
> Jim
> 
> 
> 
> 
> -----eprints-tech-bounces@ecs.soton.ac.uk wrote: -----
> To: eprints-tech@ecs.soton.ac.uk
> From: Tim Brody 
> Sent by: eprints-tech-bounces@ecs.soton.ac.uk
> Date: 05/29/2012 03:50PM
> Subject: [EP-tech] Re: Base64 decoding in 3.3
> 
> On Tue, 2012-05-29 at 12:18 +0100, James Colhoun wrote:
> > Hi,
> > 
> > 
> > I am uploading publications via sword, full text files are added to
> > the upload xml and encoded in base64 this worked fine until we
> > upgraded to 3.3. Now we get errors in the log:
> > 
> > 
> >  failed: expected 3151 bytes but actually got 3149 bytes
> > 
> > 
> > So it seems the decoding of base64 is no longer working correctly.
> > Inside EPrints/DataObj/File.pm the functions: end_element,
> characters
> > and start_element seems to create a tmp file that is corrupt.  If I
> > add a write to file inside "sub characters" (see below) the pdf is
> > created correctly so I know the data is passed in correctly, there
> > seems to be something fundamentally broken with the way the decoding
> > to tmpfile is working. Has anyone seen this are have a fix for it?
> > 
> Hi,
> 
> I can't replicate this. I did find a bug in XMLFiles for *producing*
> base64 encoded files, fixed by this:
> http://trac.eprints.org/eprints/ticket/4057
> 
> This could be an edge case - can you post your XML somewhere or email
> it
> to me directly (if not too big)?
> 
> -- 
> All the best,
> Tim
> 
> *** Options:
> http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
> *** Archive: http://www.eprints.org/tech.php/
> *** EPrints community wiki: http://wiki.eprints.org/
> 
> 
> 
> [attachment "signature.asc" removed by James
> Colhoun/sisjc5/CardiffUniversity]
> *** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
> *** Archive: http://www.eprints.org/tech.php/
> *** EPrints community wiki: http://wiki.eprints.org/

Attachment: signature.asc
Description: This is a digitally signed message part