EPrints Technical Mailing List Archive

Message: #09060


< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First

Re: [EP-tech] Query regarding EPrints API


Hi Divije,

I have tested this on my EPrints 3.4.4 test instance and I can successful upload a 4.3MB PDF and download it and view it in a PDF viewer.  Here is the XML I used for a small file I test with to start:

<?xml version="1.0" encoding="utf-8"?>
<eprints xmlns="http://eprints.org/ep2/data/2.0">
  <eprint>
    <documents>
      <document>
        <files>
          <file>
            <filename>test.txt</filename>
            <data>VGhpcyBpcyBhIHRleHQgZmlsZS4K</data>
          </file>
        </files>
      </document>
    </documents>
    <type>article</type>
    <title>Test title</title>
    <abstract>Test abstract</abstract>
    <ispublished>pub</ispublished>
    <refereed>TRUE</refereed>
    <date>2018-06-07</date>
    <creators>
      <item>
        <name>
          <family>Newman</family>
          <given>David</given>
        </name>
        <id>drn@ecs.soton.ac.uk</id>
      </item>
    </creators>
    <userid>1</userid>
  </eprint>
</eprints>

All I did for the 4.3MB PDF was change the filename and the content inside the data tag (and the title and abstract values so I could differentiate between the two eprints on my repository).  I used the following Curl command to submit the EPrints XML to my repository:

curl -X POST -u USERNAME:PASSWORD --data-binary "@/home/eprints/test_base64_doc_large.xml" -H "Content-Type: application/vnd.eprints.data+xml" https://eprints.example.org/id/contents

Obviously I have hidden private information with USERNAME and PASSWORD and used an example hostname.  All I did was run the Unix command base64 to convert the PDF into base64 and write this to a file on disk.  I then just edited this file and inserted the EPrints XML around it. 

Just as I was about to send this I thought 4.3MB might have been borderline for your files that exceed 4MB, so I tested with a 6.1MB file and this uploaded, downloaded and then loaded in a PDF viewer without issue.  Maybe the method you are using to generate the base64 encoded file or the library used to emulate my curl request is the issue.  I am not aware of anything that may have changed in recent versions of EPrints that means this works in 3.4.4 but not the version of EPrints your are running.  Although it is worth knowing which version of EPrints are you running?  One other thing I noted in your example XML there is an XML entity carriage return (&#13;).  I am not sure why this would be included in base64 data.  Obviously, this is for the small file example XML that you said was working.  So this is probably just a red herring.

Regards

David Newman

On 09/09/2022 2:32 pm, Divije Narasimhachar via Eprints-tech wrote:
CAUTION: This e-mail originated outside the University of Southampton.

Hi,

 

I am a developer from Clarivate Technologies and I work for the product converis. We have a feature where we export publications into EPrints. The export is done in the form of an xml.

 

When we export publications into EPrints we also export the files attached to it. We do this by putting the encoded contents of the file in the ‘data’ xml tag something like this.

 

<documents>

     <document>

          <files>

              <file>

                  <filename>filename.txt</filename>

                  <data>MjAyMiwOS0wNCAyMzoyNDo0M4MDEgRVJST1IgW2NvbS5jb252ZXJpy5kYXRhZXhjaGFuZ2U&#13;</dat

</file>

          </files>

          <format>text/plain</format>

          <main>filename.txt</main>

     </document>

</documents>

 

We have an issue where the export fails if the size of the attached file exceeds 4MB.

 

The export works fine if the file size is in kilo bytes or if there is no file.

 

Is there a workaround to this?

 

Can we export the same file in parts(ex. 1MB at a time) to the same publication instead of a huge size (ex. 10MB) at one shot?

 

Thanks and Regards

Divije Narasimhachar

Senior Software Engineer

 

Clarivate™
Accelerating innovation

Confidentiality note: This e-mail may contain confidential information from Clarivate. If you are not the intended recipient, be aware that any disclosure, copying, distribution or use of the contents of this e-mail is strictly prohibited. If you have received this e-mail in error, please delete this e-mail and notify the sender immediately.


*** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
*** Archive: http://www.eprints.org/tech.php/
*** EPrints community wiki: http://wiki.eprints.org/