[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[EP-tech] Seeing unusually high downloads in IRStats



Irstats is just wrong in using the http access instead than a javascript 
library (piwik, google analytics). This libraries already has the 
knowledge to fight the spammer/bot and rely on a real interaction with a 
web browser instead of an http access.

The added value of Irstats is in showing simple stats for every items, 
views and downloads, for a period of time. Replicating this simple 
statistics in an existing system (like piwik) would be the best solution.


Il 26/07/2016 11:16, Enio Carboni ha scritto:
>
> Hi Betsy,
>
> i write an IP plugin for IRstats2 a few months ago ( to exclude admin 
> local IP) where you set IP or range IP or CIDR to a config file.
>
> To use this add the new filter in cfg/cfg.d/z_irstats2.pl like this:
>
> $c->{irstats2}->{datasets} = {access => { filters => [ 'Robots', 
> 'Repeat','IP' ] } },
>
> Note the last filter IP
>
> You can download at github and try at 
> https://github.com/eniocarboni/irstats2-filter-by-ip
>
> There is also a test script irstats2-filter-by-ip.pl in 
> archive/<ID>/bin to test the config file before process all stats.
>
> You could use it this way:
>
> ./irstats2-filter-by-ip.pl <ID> 103.25.156.5
>
> or
>
> ./irstats2-filter-by-ip.pl <ID> 103.25.156.1-103.25.156.19
>
> Of course do not forget to add the IP range to be discarded in cfg / 
> cfg.d / z_irstats2_filter_ipcidr_blocks.pl
>
> Let me know if it was useful
>
> Enio Carboni
>
> In data luned? 25 luglio 2016 23:45:16 CEST, Coles, Elizabeth A. 
> (Betsy) ha scritto:
>
> Forwarding from JISC-REPOSITORIES list ? we?ve been seeing this in 
> California too, and our IRStats2 counts are through the roof for the 
> last couple of weeks.
>
> Can anyone tell me how to filter out these robots in IRStats2?  And 
> how to clean the access file so that our irstats2 reports are not 
> distorted by this deluge?  I assume I?d want to delete all entries 
> with a requester_id in the table below and rerun IRstats2 setup from 
> scratch.
>
> Thanks,
>
> Betsy Coles
>
> Caltech ? Digital Library Development
>
> bcoles at caltech.edu <mailto:bcoles at caltech.edu>
>
> From: Repositories discussion list 
> [mailto:JISC-REPOSITORIES at JISCMAIL.AC.UK] On Behalf Of Hilary Jones 
> Sent: Friday, July 15, 2016 3:43 AM To: 
> JISC-REPOSITORIES at JISCMAIL.AC.UK 
> <mailto:JISC-REPOSITORIES at JISCMAIL.AC.UK> Subject: Seeing unusually 
> high downloads in IRStats - IRUS-UK's explanation and why this isn't 
> affecting IRUS-UK stats
>
> Hi everyone,
>
> There was a discussion, via UKCORR mailing list, on why there are 
> exceptionally high downloads being seen this week in IRStats and what 
> might be causing it.
>
> After some investigation we have found that the unusually high 
> downloads are down to four IP ranges:
>
> IP range
>
> 	
>
> Organisation
>
> 	
>
> Location
>
> 	
>
> No. IP addresses
>
> 103.25.156.*
>
> 	
>
> Microsoft Bingbot
>
> 	
>
> China
>
> 	
>
> 128
>
> 103.36.96.*
>
> 	
>
> Microsoft Corporation
>
> 	
>
> China
>
> 	
>
> 216
>
> 111.221.28.*
>
> 	
>
> Microsoft Bingbot
>
> 	
>
> China
>
> 	
>
> 256
>
> 202.89.235.*
>
> 	
>
> Microsoft Bingbot
>
> 	
>
> China
>
> 	
>
> 80
>
> These IPs have been systematically trawling and downloading files from 
> many UK repositories. Looking at their User Agent strings they do not 
> declare themselves as bots but masquerade as normal users.
>
> Happily, the IRUS-UK ingest has been filtering out these robotic 
> downloads, so you won?t see a massive spike in your IRUS-UK stats.
>
> We hope this is of help.
>
> Best wishes
>
> Hilary
>
> Hilary JonesServices and Projects Support
>
> 0161 413 7541 Skype hilary.jones at jisc.ac.uk 
> <mailto:hilary.jones at jisc.ac.uk>Twitter @JonesHilaryJ 6th Floor 
> Churchgate House, 56 Oxford Street, Manchester, M1  6EU
>
> jisc.ac.uk <http://www.jisc.ac.uk/>
>
> Jisc is a registered charity (number 1149740) and a company limited by 
> guarantee which is registered in England under Company No. 5747339, 
> VAT No. GB 882 5529 90. Jisc?s registered office is: One Castlepark, 
> Tower Hill, Bristol, BS2 0JA. T 0203 697 5800. jisc.ac.uk 
> <http://www.jisc.ac.uk/>
>
> *** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
> *** Archive: http://www.eprints.org/tech.php/
> *** EPrints community wiki: http://wiki.eprints.org/
> *** EPrints developers Forum: http://forum.eprints.org/