[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
I like it :) It's a very useful tool. No one likes dead links.
On Fri, Apr 7, 2017 at 1:03 PM, <martin.braendle at id.uzh.ch> wrote:
> I just wrote a linkcheck crawler that checks the remote URLs stored in an
> EPrints repo and updates the issues list for URLs that have an invalid
> format or report HTTP status codes other than 200.
> Please let me know if there is an interest to have it available, then I will
> put it on GitHub. There's some more work to do, e.g. move some of the
> methods to a plugin so that they can be called from elsewhere.
> Please also be aware that by applying a linkcheck crawler your editorial
> team may come under strain to fix all the dead links. Our initial run
> revealed that after 10 years of running our repository, about 25% of the
> URLs (about 7500 in our case) are now working anymore.
> The script also produces a report by HTTP status code and that is sorted
> either by eprint id or by URL. The latter allows to identify patterns so
> that URLs can be replaced or removed in batch.
> Best regards,
> Dr. Martin Br?ndle
> Zentrale Informatik
> Universit?t Z?rich
> Stampfenbachstr. 73
> CH-8006 Z?rich
> *** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
> *** Archive: http://www.eprints.org/tech.php/
> *** EPrints community wiki: http://wiki.eprints.org/
> *** EPrints developers Forum: http://forum.eprints.org/
- Fri, 7 Apr 2017 18:03:47 +0200 - [EP-tech] Linkcheck - martin.braendle at id.uzh.ch (martin.braendle at id.uzh.ch)