[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[EP-tech] Scripted XML download?



Hi,
I do some checking, analysis and visualisation of our repository in a third-party package, and I have it set up to ingest Eprints XML.  I'd like to update this once a week or so, but if I download it all in one big go it takes about 3 hours, 1.5GB, and tends to fail halfway in.  I have been doing it manually one year at a time, but that means 17 separate manual search-and-download operations, each taking ten minutes or so.  I don't have shell access to the server, so can't script it command-line.

I have looked at the search page but after a search, the download form references a cached search id so I can't just copy the URL in the download form.

Can anyone give me a template for a URL that would work in a single pass in wget or libwww,  that I could then cron to fetch the EPXML ?  Obviously I have to be able to authenticate as well...  ?

Andy Reid
Research Information Manager
Executive Office, Room G40a
London School of Hygiene and Tropical Medicine
Keppel St, LONDON, WC1E 7HT
0207-927-2618 (Internal/Teleworker x2618)

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ecs.soton.ac.uk/pipermail/eprints-tech/attachments/20170327/a913032c/attachment.html