[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[EP-tech] Bot net Attacks



Hi Jens,
We had a similar issue a while ago - which was being caused by on-the-fly (or request-driven) view generation.
For author views, where some authors do not have a unique ID, this is computationally intensive, and it is better addressed by making sure you are in control of the page generation.
If the pages have already been generated, normal bot activity should not be an issue.

We addressed the issue by:

1)      Changing the max_age of the person view (and in the end, all views) so regeneration is never triggered by a request

2)      Changing the generate_views to be able to accept multiple viewids

3)      Changing the cronjob to generate all views *except* the person one daily, and the person one once a week

The details are here:
https://github.com/eprints/eprints/pull/417

Our 'subject' and 'year' views have
max_menu_age => 10*24*60*60, #10 days
max_list_age => 10*24*60*60, #10 days
But our 'people' view has:
max_menu_age => 20*24*60*60, #20 days
max_list_age => 20*24*60*60, #20 days

Our crontab looks a bit like this (with the above additions in place):
# # # Generate subject and year views Mon-Fri
10 3 * * 1-5 <eprints_root>/bin/generate_views <archiveid> --view year --view subject
# # # Generate people view on Sunday
10 5 * * 0 <eprints_root>/bin/generate_views <archiveid>  --view people

It might be worth checking to see if your person views are being generated on-the-fly - look at the modification time of the browse pages - compare them to the bot activity. If they are being re-generated by a request the above will help. If not? more thinking to do!

Cheers,
John

From: eprints-tech-bounces at ecs.soton.ac.uk [mailto:eprints-tech-bounces at ecs.soton.ac.uk] On Behalf Of jens.vieler at id.uzh.ch
Sent: 07 March 2017 13:30
To: eprints-tech at ecs.soton.ac.uk
Subject: [EP-tech] Bot net Attacks


Dear List

during the last 2 weeks we observe some really tricky bot net attacks. Thousands of requests seem to ask for author information and as a result, we ran into disk space problems during the massive pagecreation for /view/authorsnew into the cache.

We did some interesting experiments with apaches mod_evasive. But unfortunately bot nets change their behaviour to maximum allowed acces per timeframe we configured. Looks like they know what to do to drive eprints into trouble...

What do you recommend?

Jens

--
Jens Vieler
Zentrale Informatik
Universit?t Z?rich
Stampfenbachstrasse 73
CH-8006 Z?rich

mail:  jens.vieler at id.uzh.ch<mailto:jens.vieler at id.uzh.ch>
phone: +41 44 63 56777
http://www.id.uzh.ch

[Inactive hide details for Andrew Collington ---07.03.2017 12:44:27---Hi John, That helps a huge amount, thanks you so much!  Th]Andrew Collington ---07.03.2017 12:44:27---Hi John, That helps a huge amount, thanks you so much!  This gives me a great base on which to maybe

Von: Andrew Collington <a.p.collington at sussex.ac.uk<mailto:a.p.collington at sussex.ac.uk>>
An: "eprints-tech at ecs.soton.ac.uk<mailto:eprints-tech at ecs.soton.ac.uk>" <eprints-tech at ecs.soton.ac.uk<mailto:eprints-tech at ecs.soton.ac.uk>>
Datum: 07.03.2017 12:44
Betreff: Re: [EP-tech] Easier way to do this in a citation?
Gesendet von: eprints-tech-bounces at ecs.soton.ac.uk<mailto:eprints-tech-bounces at ecs.soton.ac.uk>

________________________________



Hi John,

That helps a huge amount, thanks you so much!  This gives me a great base on which to maybe add some functionality of our own and the citation reworking using the choose/when does make it a bit clearer until I do (hopefully) get something written.  So many thanks for a very helpful response!

Andy


From: eprints-tech-bounces at ecs.soton.ac.uk<mailto:eprints-tech-bounces at ecs.soton.ac.uk> [mailto:eprints-tech-bounces at ecs.soton.ac.uk] On Behalf Of John Salter
Sent: 03 March 2017 15:58
To: eprints-tech at ecs.soton.ac.uk<mailto:eprints-tech at ecs.soton.ac.uk>
Subject: Re: [EP-tech] Easier way to do this in a citation?

Hi Andrew,
I don't think there is a easy way to do what you require in the existing epscript functions (but there is a way to add it - more on that below!).

You could alter your existing code to make the tests clearer:
<choose>
  <when test="event_title and event_location and event_dates">
    <print expr="event_title" />, <print expr="event_location"/>, <print expr="event_dates"/>.
</when>
<when test="event_title and event_location">
   <print expr="event_title" />, <print expr="event_location"/>.
</when>
<when test="event_title ">
   <print expr="event_title" />.
</when>
</choose>

But this does feel a bit noisy too.

To add (inject) a custom method to EPrints::Script::Compiled, see the example here: https://wiki.eprints.org/w/ORCID#Rendering_the_ORCID_in_a_citation
This keeps the added code in the repository config - and should work over upgrades (unless there's a major rewrite of EPrints::Script).
In my opinion, any files in <eprints_root>/archives/<archiveid>/cfg/cfg.d/ should be checked as part of an upgrade.

For a similar example, in White Rose Research Online, we wanted to render event dates in a more 'friendly' human way.
They are stored in the database as 'yyyy-mm-dd - yyyy-mm-dd' (or just yyyy-mm-dd if it was a one-day event), and we wanted e.g. '1-3 Mar 2017' '28 Feb - 1 Mar 2017' or '31 Dec 2016 - 2 Jan 2017'.
This https://gist.github.com/jesusbagpuss/491086533294f864de63115c66719def adds a method to EPrints::Script::Compiled that does this conversion. The citation uses:
<if test="event_dates"><print expr="wrro_human_event_dates(event_dates)"/></if>

Hope that helps!
Cheers,
John



From: eprints-tech-bounces at ecs.soton.ac.uk<mailto:eprints-tech-bounces at ecs.soton.ac.uk> [mailto:eprints-tech-bounces at ecs.soton.ac.uk] On Behalf Of Andrew Collington
Sent: 03 March 2017 13:49
To: eprints-tech at ecs.soton.ac.uk<mailto:eprints-tech at ecs.soton.ac.uk>
Subject: [EP-tech] Easier way to do this in a citation?

Hi all,

I?ve recently had to add a few new rules in a citation for a conference proceeding.  At the moment I have a number of checks that look something like this:

<if test="is_set(event_title)">
<print expr="event_title"/>
</if>
<if test="event_location">
<if test="is_set(event_title)">, </if>
<print expr="event_location"/>
</if>
<if test="is_set(event_dates)">, <print expr="event_dates"/></if>
<if test="is_set(event_title) or is_set(event_location) or is_set(event_dates)">.</if>

So it?ll only add a comma before the location if the title is supplied, etc. and the full-stop at the end if any event details are shown.  But, well, as you can see it?s a pretty messy way to do things and wondered if there were something a little more streamlined available that would allow you to supply a list of fields and it?ll then automatically put commas between values if the values are there and a full-stop at the end if needs be?  I?m only trying to do this with cite tags in citations/eprints/default.xml.

I did see a ?pretty_list? function in Compiled.pm that looks like it may do something like what I want, but despite trying I didn?t see how I could pull this into a citation nor could I find any documentation on the subject.  Is that possible?

If this kind of functionality doesn?t already exist, then what?s the best course of action to adding new types of actions to cite tags?  Is it possible to create my own class do add extra actions, or should I update existing modules? (which seems like a bad idea if ever wanting to upgrade).  Is there any documentation about doing this kind of thing?

Many thanks for any advice,

Andy


--
Andrew Collington
Web Programmer, ITS Client Services
ITS-CS Shawcross, University of Sussex, Falmer, Brighton, BN1 9QT

T: (01273) 872591 (ext. 2591)
E: a.p.collington at sussex.ac.uk<mailto:a.p.collington at sussex.ac.uk>
 *** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
*** Archive: http://www.eprints.org/tech.php/
*** EPrints community wiki: http://wiki.eprints.org/
*** EPrints developers Forum: http://forum.eprints.org/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ecs.soton.ac.uk/pipermail/eprints-tech/attachments/20170307/7c9e8182/attachment-0001.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.gif
Type: image/gif
Size: 105 bytes
Desc: image001.gif
Url : http://mailman.ecs.soton.ac.uk/pipermail/eprints-tech/attachments/20170307/7c9e8182/attachment-0001.gif