[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[EP-tech] Re: Injecting gigabyte-scale files into EPrints archive - impossible?



There's no official documentation about toolbox, it should be documented 
better.

Can't you just use import with this options:

    --enable-import-ids
             By default import will generate a new eprintid, or userid for
             each record. This option tells it to use the id spcified in the
             imported data. This is generally used for importing into a new
             repository from an old one.


     --enable-file-imports
             Allow the imported data to import files from the local
             filesystem. This can obviously be seen as a security hole 
if you
             don't trust the data you are importing. This sets the
             "enable_file_imports" configuration option for this session
             only.

after you've exported the eprints, modified the document section and 
reimporting it?

Another option is to use a Perl Library for efficient file handling and 
change the code where it does

  join("", <STDIN>)




Il 01/08/2014 11:25, Florian He? ha scritto:
> Hello developers and users,
>
> again I'm sorry I have to consult you concerning a problem we've run
> into and couldn't solve ourselves.
>
> We need to attach a big file to a document, i.e. one of 3g in size. We
> limited web upload to 100m by webserver configuration in order that we
> keep control of large file uploads. To get bigger file into the archive
> we successfully use the following command:
>
> /usr/bin/perl ~eprints/bin/toolbox $repo addFile \
>      --document $docid --filename $filename < /path/to/existing/file
>
> (Besides, is there a convenient way of getting the document id? It is
> rather tedious to upload a placeholder file so we can manually seek and
> grab a doc id by Firebug extension; after running the command, we open
> the EPrint file dialog in the document metadata to switch the main file
> and delete the placeholder.)
>
> I narrowed this method down to a line of code in
> EPrints::Toolbox::get_data() that I question is scalable for these
> dimensions (given our hardware memory space):
>
>       join("", <STDIN>)
>
> builds, in EPrints 3.3.10, a monstrous perl scalar that certainly is
> perpetually expanded and moved around in memory to fit in. I wonder if
> there is a way I can move the file to the expected place myself and
> adjust the file record in the EPrint database. Tried this already but at
> last I ended up downloading the tiny placeholder file again. I deleted
> the file in the console (rm), but then EPrints system threw "couldn't
> read file contents". So, somewhere things still were arranged for the
> old file. The browser displays, though, the right filename in the modal
> dialog offering to save or to open the file with a program whatsoever.
>
> The toolbox command was appallingly running more than two hours and
> gorging swap space like there was no tomorrow, then we killed it. It
> consumed 2% of CPU in average, status flag was "D" most of the time (man
> ps: "uninterruptable sleep (usually IO)"). It appeared to me it was
> constantly swapping.
>
> Today I tried the toolbox addDocument command which doesn't seem to save
> me work after all, it just requires xml data. But with
> <url>file:///path/of/file/to/import</url>, it runs out of disk space
> again while "downloading" that url in /tmp.
> Wish I could pass a path of a file to be copied directly, isn't that
> possible somehow?
>
>
> Kind regards
> Florian
>
>