[index] [options] [help]
See the Contact page for how to subscribe and unsubscribe.

eprints_tech messages

Please note: this page shows emails that have been sent to the eprints_tech mailing list. Some of these may be spam emails we have failed to filter.

[EP-tech] Capture from URL issues

From: Guy Knights <g.knights AT qut.edu.au>
Date: Fri, 26 Jun 2009 14:45:43 +1000


Threading:      • This Message
             [EP-tech] Re: Capture from URL issues from g.knights AT qut.edu.au
             [EP-tech] Re: Capture from URL issues from tdb2 AT ecs.soton.ac.uk

http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
*** EPrints community wiki - http://wiki.eprints.org/
Hi,

I'm trying the capture from url feature in our eprints 3.1.1 installation and 
have run into a few issues with pages I've been using as samples. I've tried 3 
or 4 different sites, and the data that's captured is a little inconsistent. 
Sometimes the capture feature seems to grab all components fine, sometimes it 
doesn't seem to grab the css, other times it doesn't download the images. 

If any of you are familiar with the function, do you have any advice you could 
impart?

Thanks,
Guy

Guy Knights
Computer Systems Officer
Library eServices
Room D320
Victoria Park Rd, Kelvin Grove
Ph: (07) 3138 3910
E: g.knights AT qut.edu.au


[EP-tech] Re: Capture from URL issues

From: Guy Knights <g.knights AT qut.edu.au>
Date: Fri, 3 Jul 2009 13:52:11 +1000


Threading: [EP-tech] Capture from URL issues from g.knights AT qut.edu.au
      • This Message
             [EP-tech] Re: Capture from URL issues from g.knights AT qut.edu.au

http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
*** EPrints community wiki - http://wiki.eprints.org/
Just bumping this as we're still a little in the dark about how well capture 
from url is expected to work, and any known limitations that people are aware 
of.

Thanks,
Guy

Guy Knights
Computer Systems Officer
Library eServices
Room D320
Victoria Park Rd, Kelvin Grove
Ph: (07) 3138 3910
E: g.knights AT qut.edu.au
________________________________________
From: eprints-tech-bounces AT ecs.soton.ac.uk [eprints-tech-bounces AT 
ecs.soton.ac.uk] On Behalf Of Guy Knights [g.knights AT qut.edu.au]
Sent: 26 June 2009 14:45
To: eprints-tech AT ecs.soton.ac.uk
Subject: [EP-tech]  Capture from URL issues

http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
*** EPrints community wiki - http://wiki.eprints.org/
Hi,

I'm trying the capture from url feature in our eprints 3.1.1 installation and 
have run into a few issues with pages I've been using as samples. I've tried 3 
or 4 different sites, and the data that's captured is a little inconsistent. 
Sometimes the capture feature seems to grab all components fine, sometimes it 
doesn't seem to grab the css, other times it doesn't download the images.

If any of you are familiar with the function, do you have any advice you could 
impart?

Thanks,
Guy

Guy Knights
Computer Systems Officer
Library eServices
Room D320
Victoria Park Rd, Kelvin Grove
Ph: (07) 3138 3910
E: g.knights AT qut.edu.au


[EP-tech] Re: Capture from URL issues

From: Guy Knights <g.knights AT qut.edu.au>
Date: Thu, 9 Jul 2009 14:13:52 +1000


Threading: [EP-tech] Re: Capture from URL issues from g.knights AT qut.edu.au
      • This Message
             [EP-tech] Re: Capture from URL issues from b.wheeler AT ulcc.ac.uk

http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
*** EPrints community wiki - http://wiki.eprints.org/
Anyone?

Guy Knights
Computer Systems Officer
Library eServices
Room D320
Victoria Park Rd, Kelvin Grove
Ph: (07) 3138 3910
E: g.knights AT qut.edu.au
________________________________________
From: eprints-tech-bounces AT ecs.soton.ac.uk [eprints-tech-bounces AT 
ecs.soton.ac.uk] On Behalf Of Guy Knights [g.knights AT qut.edu.au]
Sent: 03 July 2009 13:52
To: eprints-tech AT ecs.soton.ac.uk
Subject: [EP-tech] Re: Capture from URL issues

http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
*** EPrints community wiki - http://wiki.eprints.org/
Just bumping this as we're still a little in the dark about how well capture 
from url is expected to work, and any known limitations that people are aware 
of.

Thanks,
Guy

Guy Knights
Computer Systems Officer
Library eServices
Room D320
Victoria Park Rd, Kelvin Grove
Ph: (07) 3138 3910
E: g.knights AT qut.edu.au
________________________________________
From: eprints-tech-bounces AT ecs.soton.ac.uk [eprints-tech-bounces AT 
ecs.soton.ac.uk] On Behalf Of Guy Knights [g.knights AT qut.edu.au]
Sent: 26 June 2009 14:45
To: eprints-tech AT ecs.soton.ac.uk
Subject: [EP-tech]  Capture from URL issues

http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
*** EPrints community wiki - http://wiki.eprints.org/
Hi,

I'm trying the capture from url feature in our eprints 3.1.1 installation and 
have run into a few issues with pages I've been using as samples. I've tried 3 
or 4 different sites, and the data that's captured is a little inconsistent. 
Sometimes the capture feature seems to grab all components fine, sometimes it 
doesn't seem to grab the css, other times it doesn't download the images.

If any of you are familiar with the function, do you have any advice you could 
impart?

Thanks,
Guy

Guy Knights
Computer Systems Officer
Library eServices
Room D320
Victoria Park Rd, Kelvin Grove
Ph: (07) 3138 3910
E: g.knights AT qut.edu.au



[EP-tech] Re: Capture from URL issues

From: Tim Brody <tdb2 AT ecs.soton.ac.uk>
Date: Thu, 09 Jul 2009 10:11:18 +0100


Threading: [EP-tech] Capture from URL issues from g.knights AT qut.edu.au
      • This Message

http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
*** EPrints community wiki - http://wiki.eprints.org/
"wget" is the tool used to capture from URLs. The invocation (in
SystemSettings) is something like this:
'wget' => '$(wget)  -r -L -q -m -nH -np --execute="robots=off"
--cut-dirs=$(CUTDIRS) $(URL)'

You can try playing around with the wget settings, but it's likely to be
hit-and-miss, especially with pages that aren't robot-friendly.

/Tim.

On Thu, 2009-07-09 at 14:13 +1000, Guy Knights wrote:
> http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
> *** EPrints community wiki - http://wiki.eprints.org/
> Anyone?
> 
> Guy Knights
> Computer Systems Officer
> Library eServices
> Room D320
> Victoria Park Rd, Kelvin Grove
> Ph: (07) 3138 3910
> E: g.knights AT qut.edu.au
> ________________________________________
> From: eprints-tech-bounces AT ecs.soton.ac.uk [eprints-tech-bounces AT 
ecs.soton.ac.uk] On Behalf Of Guy Knights [g.knights AT qut.edu.au]
> Sent: 03 July 2009 13:52
> To: eprints-tech AT ecs.soton.ac.uk
> Subject: [EP-tech] Re: Capture from URL issues
> 
> http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
> *** EPrints community wiki - http://wiki.eprints.org/
> Just bumping this as we're still a little in the dark about how well 
capture from url is expected to work, and any known limitations that people are 
aware of.
> 
> Thanks,
> Guy
> 
> Guy Knights
> Computer Systems Officer
> Library eServices
> Room D320
> Victoria Park Rd, Kelvin Grove
> Ph: (07) 3138 3910
> E: g.knights AT qut.edu.au
> ________________________________________
> From: eprints-tech-bounces AT ecs.soton.ac.uk [eprints-tech-bounces AT 
ecs.soton.ac.uk] On Behalf Of Guy Knights [g.knights AT qut.edu.au]
> Sent: 26 June 2009 14:45
> To: eprints-tech AT ecs.soton.ac.uk
> Subject: [EP-tech]  Capture from URL issues
> 
> http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
> *** EPrints community wiki - http://wiki.eprints.org/
> Hi,
> 
> I'm trying the capture from url feature in our eprints 3.1.1
>  installation and have run into a few issues with pages I've been using
>  as samples. I've tried 3 or 4 different sites, and the data that's
>  captured is a little inconsistent. Sometimes the capture feature seems
>  to grab all components fine, sometimes it doesn't seem to grab the
>  css, other times it doesn't download the images.
> 
> If any of you are familiar with the function, do you have any advice you 
could impart?
> 
> Thanks,
> Guy
> 
> Guy Knights
> Computer Systems Officer
> Library eServices
> Room D320
> Victoria Park Rd, Kelvin Grove
> Ph: (07) 3138 3910
> E: g.knights AT qut.edu.au
> 
> 


[EP-tech] Re: Capture from URL issues

From: Ben Wheeler <b.wheeler AT ulcc.ac.uk>
Date: Fri, 10 Jul 2009 11:21:58 +0100


Threading: [EP-tech] Re: Capture from URL issues from g.knights AT qut.edu.au
      • This Message

http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
*** EPrints community wiki - http://wiki.eprints.org/
On Fri, Jul 03, 2009 at 01:52:11PM +1000, Guy Knights wrote:
> http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
> *** EPrints community wiki - http://wiki.eprints.org/
> Just bumping this as we're still a little in the dark about how well 
> capture from url is expected to work, and any known limitations that 
> people are aware of.

Web archiving is fraught with difficulties. It's surprisingly hard
(and actually bordering impossible in some cases) to automatically
determine all the required media to reproduce a page or a set of 
pages correctly. I personally wouldn't use any mechanism that
didn't offer fine-grained controls to include or exclude certain
additional files, as I have almost always had to make use of these
when harvesting pages or sites for archiving. An automatic method
generally fetches either too little, or too much, or too much but
still misses some bits. 

Ben


[index] [options] [help]