[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[EP-tech] Making a static copy of an EPrints repo



I've done this a few times in the past. If it's to leave online but 
static, I suggest removing links to the OAI, search and latest pages 
etc, there's an example here; http://eprints.agentlink.org/

By the way, I really like calling this "fossilisation" as it's a good 
fit for the process of replacing a dynamic site with a static one.

On 18/07/2017 11:43, Matthew Kerwin wrote:
> On 18 July 2017 at 19:04, Ian Stuart <Ian.Stuart at ed.ac.uk> wrote:
>> I need to make a read-only, static, copy of an old repo (the hardware is
>> dying, the installation was heavily tailored for the environment, and I
>> don't have the time to re-create in a new environment.)
>>
>> I can grab all the active pages:
>>
>>     wget --local-encoding=UTF-8 --remote-encoding=UTF-8 --no-cache
>> --mirror -nc -k http://my.repo/
>>
>> This is good, however it doesn't edit all the absolute URLs in the view
>> pages, so we need to modify them:
>>
>>     find my.repo -type f -exec sed -i 's_http://my.repo/_/_g' {} +
>>
>> However this leaves me with the problem that the http://my.repo/nnn/
>> pages haven't been pulled down!
>>
>> Any suggestions on how to do this?
>>
>> Cheers
>>
> Depends how many records there are, and how sparse.  Do you have a
> sitemap?  It might be worth parsing that, and fetching them one by
> one.
>
> If you're desperate, there's always:
>
>      for id in {1..12345} ; do wget --etc http://my.repo/$id ; done
>
> Cheers

-- 
Christopher Gutteridge -- http://users.ecs.soton.ac.uk/cjg

University of Southampton Open Data Service: http://data.southampton.ac.uk/
You should read our Web & Data Innovation blog: http://blogs.ecs.soton.ac.uk/webteam/