[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[EP-tech] Seeing unusually high downloads in IRStats



With Apache:

RewriteEngine On
RewriteCond %{HTTP:User-Agent} 
(?:Yandex|msnbot|Owlinbo|sistrix|genieo|proximic|MJ12bot|AhrefsBot|searchmetrics|SearchmetricsBot|Baidu) 
[NC]
RewriteRule .? - [F]

just add the guilty.

Problem solved :-D

Il 26/07/2016 14:13, Graham, Clinton T ha scritto:
>
> The University of Pittsburgh opened ticket UCM000000270852 with Bing 
> Webmaster Support last week regarding this and received the following 
> response:
>
> Thank you for contacting Bing Webmaster Support.  The activity you are 
> seeing is most likely caused by one of our bots used for verifying 
> your site rather than indexing your site as Bingbot does.  These 
> crawlers do not have the same UA, and are in place to make sure the 
> verification aspects of your site are in place.
>
> Yesterday, we requested additional information on what ?verification? 
> really means, and describe the problem of conflating user-generated 
> activity with bot-generated activity, especially for the scholarly 
> publication process.
>
> I?ll reply again here if this support request goes anywhere, but 
> perhaps others might be interested in similarly engaging Bing 
> Webmaster Support?
>
> Enjoy,
>
> - Clinton Graham
>
> Systems Developer
>
> University of Pittsburgh | University Library System
>
> 412-383-1057
>
> *From:*eprints-tech-bounces at ecs.soton.ac.uk 
> [mailto:eprints-tech-bounces at ecs.soton.ac.uk] *On Behalf Of *Coles, 
> Elizabeth A. (Betsy)
> *Sent:* Monday, July 25, 2016 7:45 PM
> *To:* eprints-tech at ecs.soton.ac.uk
> *Subject:* [EP-tech] Seeing unusually high downloads in IRStats
>
> Forwarding from JISC-REPOSITORIES list ? we?ve been seeing this in 
> California too, and our IRStats2 counts are through the roof for the 
> last couple of weeks.
>
> Can anyone tell me how to filter out these robots in IRStats2?  And 
> how to clean the access file so that our irstats2 reports are not 
> distorted by this deluge?  I assume I?d want to delete all entries 
> with a requester_id in the table below and rerun IRstats2 setup from 
> scratch.
>
> Thanks,
>
> Betsy Coles
>
> Caltech ? Digital Library Development
>
> bcoles at caltech.edu <mailto:bcoles at caltech.edu>
>
> *From:* Repositories discussion list 
> [mailto:JISC-REPOSITORIES at JISCMAIL.AC.UK] *On Behalf Of *Hilary Jones
> *Sent:* Friday, July 15, 2016 3:43 AM
> *To:* JISC-REPOSITORIES at JISCMAIL.AC.UK 
> <mailto:JISC-REPOSITORIES at JISCMAIL.AC.UK>
> *Subject:* Seeing unusually high downloads in IRStats - IRUS-UK's 
> explanation and why this isn't affecting IRUS-UK stats
>
> Hi everyone,
>
> There was a discussion, via UKCORR mailing list, on why there are 
> exceptionally high downloads being seen this week in IRStats and what 
> might be causing it.
>
> After some investigation we have found that the unusually high 
> downloads are down to four IP ranges:
>
> IP range
>
> 	
>
> Organisation
>
> 	
>
> Location
>
> 	
>
> No. IP addresses
>
> 103.25.156.*
>
> 	
>
> Microsoft Bingbot
>
> 	
>
> China
>
> 	
>
> 128
>
> 103.36.96.*
>
> 	
>
> Microsoft Corporation
>
> 	
>
> China
>
> 	
>
> 216
>
> 111.221.28.*
>
> 	
>
> Microsoft Bingbot
>
> 	
>
> China
>
> 	
>
> 256
>
> 202.89.235.*
>
> 	
>
> Microsoft Bingbot
>
> 	
>
> China
>
> 	
>
> 80
>
> These IPs have been systematically trawling and downloading files from 
> many UK repositories. Looking at their User Agent strings they do not 
> declare themselves as bots but masquerade as normal users.
>
> Happily, the IRUS-UK ingest has been filtering out these robotic 
> downloads, so you won?t see a massive spike in your IRUS-UK stats.
>
> We hope this is of help.
>
> Best wishes
>
> Hilary
>
> Jisc 
> <https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2fwww.jisc.ac.uk%2f&data=01%7c01%7cctgraham%40pitt.edu%7cc90cb3f4da52477f805508d3b4e65fe1%7c9ef9f489e0a04eeb87cc3a526112fd0d%7c1&sdata=QO%2bCO4aO%2b4wNHbglnWa6s4IinzrhqbxzUGL5ieuMq5E%3d>
>
> *Hilary Jones*
> Services and Projects Support
>
> 0161 413 7541
> Skype hilary.jones at jisc.ac.uk <mailto:hilary.jones at jisc.ac.uk>
> Twitter @JonesHilaryJ
> 6th Floor Churchgate House, 56 Oxford Street, Manchester, M1  6EU
>
> *jisc.ac.uk 
> <https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2fwww.jisc.ac.uk%2f&data=01%7c01%7cctgraham%40pitt.edu%7cc90cb3f4da52477f805508d3b4e65fe1%7c9ef9f489e0a04eeb87cc3a526112fd0d%7c1&sdata=QO%2bCO4aO%2b4wNHbglnWa6s4IinzrhqbxzUGL5ieuMq5E%3d> 
> *
>
> Jisc is a registered charity (number 1149740) and a company limited by 
> guarantee which is registered in England under Company No. 5747339, 
> VAT No. GB 882 5529 90. Jisc?s registered office is: One Castlepark, 
> Tower Hill, Bristol, BS2 0JA. T 0203 697 5800. jisc.ac.uk 
> <https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2fwww.jisc.ac.uk%2f&data=01%7c01%7cctgraham%40pitt.edu%7cc90cb3f4da52477f805508d3b4e65fe1%7c9ef9f489e0a04eeb87cc3a526112fd0d%7c1&sdata=QO%2bCO4aO%2b4wNHbglnWa6s4IinzrhqbxzUGL5ieuMq5E%3d>
>
>
>
> *** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
> *** Archive: http://www.eprints.org/tech.php/
> *** EPrints community wiki: http://wiki.eprints.org/
> *** EPrints developers Forum: http://forum.eprints.org/