[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[EP-tech] Scripted XML download?
- Subject: [EP-tech] Scripted XML download?
- From: Andy.REID at lshtm.ac.uk (Andy Reid)
- Date: Mon, 27 Mar 2017 13:51:32 +0000
I do some checking, analysis and visualisation of our repository in a third-party package, and I have it set up to ingest Eprints XML. I'd like to update this once a week or so, but if I download it all in one big go it takes about 3 hours, 1.5GB, and tends to fail halfway in. I have been doing it manually one year at a time, but that means 17 separate manual search-and-download operations, each taking ten minutes or so. I don't have shell access to the server, so can't script it command-line.
I have looked at the search page but after a search, the download form references a cached search id so I can't just copy the URL in the download form.
Can anyone give me a template for a URL that would work in a single pass in wget or libwww, that I could then cron to fetch the EPXML ? Obviously I have to be able to authenticate as well... ?
Research Information Manager
Executive Office, Room G40a
London School of Hygiene and Tropical Medicine
Keppel St, LONDON, WC1E 7HT
0207-927-2618 (Internal/Teleworker x2618)
-------------- next part --------------
An HTML attachment was scrubbed...