EPrints Technical Mailing List Archive

See the EPrints wiki for instructions on how to join this mailing list and related information.

Message: #06719

Re: [EP-tech] Making a static copy of an EPrints repo

To: eprints-tech@ecs.soton.ac.uk, Matthew Kerwin <matthew@kerwin.net.au>
Subject: Re: [EP-tech] Making a static copy of an EPrints repo
From: Christopher Gutteridge <cjg@ecs.soton.ac.uk>
Date: Tue, 18 Jul 2017 16:56:08 +0100

I've done this a few times in the past. If it's to leave online butstatic, I suggest removing links to the OAI, search and latest pagesetc, there's an example here; http://eprints.agentlink.org/

By the way, I really like calling this "fossilisation" as it's a goodfit for the process of replacing a dynamic site with a static one.


On 18/07/2017 11:43, Matthew Kerwin wrote:

On 18 July 2017 at 19:04, Ian Stuart <Ian.Stuart@ed.ac.uk> wrote:

I need to make a read-only, static, copy of an old repo (the hardware is
dying, the installation was heavily tailored for the environment, and I
don't have the time to re-create in a new environment.)

I can grab all the active pages:

    wget --local-encoding=UTF-8 --remote-encoding=UTF-8 --no-cache
--mirror -nc -k http://my.repo/

This is good, however it doesn't edit all the absolute URLs in the view
pages, so we need to modify them:

    find my.repo -type f -exec sed -i 's_http://my.repo/_/_g' {} +

However this leaves me with the problem that the http://my.repo/nnn/
pages haven't been pulled down!

Any suggestions on how to do this?

Cheers

Depends how many records there are, and how sparse.  Do you have a
sitemap?  It might be worth parsing that, and fetching them one by
one.

If you're desperate, there's always:

     for id in {1..12345} ; do wget --etc http://my.repo/$id ; done

Cheers


--
Christopher Gutteridge -- http://users.ecs.soton.ac.uk/cjg

University of Southampton Open Data Service: http://data.southampton.ac.uk/
You should read our Web & Data Innovation blog: http://blogs.ecs.soton.ac.uk/webteam/

References:
- [EP-tech] Making a static copy of an EPrints repo
  - From: Ian Stuart <Ian.Stuart@ed.ac.uk>
- Re: [EP-tech] Making a static copy of an EPrints repo
  - From: Matthew Kerwin <matthew@kerwin.net.au>

Prev by Date: Re: [EP-tech] Making a static copy of an EPrints repo
Next by Date: [EP-tech] Contents on EPrints repository is not featuring on Google Scholar
Previous by thread: Re: [EP-tech] Making a static copy of an EPrints repo
Next by thread: Re: [EP-tech] Making a static copy of an EPrints repo
Index(es):
- Date
- Thread