EPrints Technical Mailing List Archive

Message: #06329

< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First

Re: [EP-tech] Bot net Attacks

Hi Jens,

We had a similar issue a while ago - which was being caused by on-the-fly (or request-driven) view generation.

For author views, where some authors do not have a unique ID, this is computationally intensive, and it is better addressed by making sure you are in control of the page generation.

If the pages have already been generated, normal bot activity should not be an issue.


We addressed the issue by:

1)      Changing the max_age of the person view (and in the end, all views) so regeneration is never triggered by a request

2)      Changing the generate_views to be able to accept multiple viewids

3)      Changing the cronjob to generate all views *except* the person one daily, and the person one once a week


The details are here:



Our 'subject' and 'year' views have

max_menu_age => 10*24*60*60, #10 days

max_list_age => 10*24*60*60, #10 days

But our 'people' view has:

max_menu_age => 20*24*60*60, #20 days

max_list_age => 20*24*60*60, #20 days


Our crontab looks a bit like this (with the above additions in place):

# # # Generate subject and year views Mon-Fri

10 3 * * 1-5 <eprints_root>/bin/generate_views <archiveid> --view year --view subject

# # # Generate people view on Sunday

10 5 * * 0 <eprints_root>/bin/generate_views <archiveid>  --view people


It might be worth checking to see if your person views are being generated on-the-fly - look at the modification time of the browse pages - compare them to the bot activity. If they are being re-generated by a request the above will help. If not… more thinking to do!





From: eprints-tech-bounces@ecs.soton.ac.uk [mailto:eprints-tech-bounces@ecs.soton.ac.uk] On Behalf Of jens.vieler@id.uzh.ch
Sent: 07 March 2017 13:30
To: eprints-tech@ecs.soton.ac.uk
Subject: [EP-tech] Bot net Attacks


Dear List

during the last 2 weeks we observe some really tricky bot net attacks. Thousands of requests seem to ask for author information and as a result, we ran into disk space problems during the massive pagecreation for /view/authorsnew into the cache.

We did some interesting experiments with apaches mod_evasive. But unfortunately bot nets change their behaviour to maximum allowed acces per timeframe we configured. Looks like they know what to do to drive eprints into trouble...

What do you recommend?


Jens Vieler
Zentrale Informatik
Universität Zürich
Stampfenbachstrasse 73
CH-8006 Zürich

mail:  jens.vieler@id.uzh.ch
phone: +41 44 63 56777

Inactive hide details for Andrew Collington ---07.03.2017 12:44:27---Hi John, That helps a huge amount, thanks you so much!  ThAndrew Collington ---07.03.2017 12:44:27---Hi John, That helps a huge amount, thanks you so much!  This gives me a great base on which to maybe

Von: Andrew Collington <a.p.collington@sussex.ac.uk>
An: "eprints-tech@ecs.soton.ac.uk" <eprints-tech@ecs.soton.ac.uk>
Datum: 07.03.2017 12:44
Betreff: Re: [EP-tech] Easier way to do this in a citation?
Gesendet von: eprints-tech-bounces@ecs.soton.ac.uk

Hi John,
That helps a huge amount, thanks you so much!  This gives me a great base on which to maybe add some functionality of our own and the citation reworking using the choose/when does make it a bit clearer until I do (hopefully) get something written.  So many thanks for a very helpful response!
From: eprints-tech-bounces@ecs.soton.ac.uk [mailto:eprints-tech-bounces@ecs.soton.ac.uk] On Behalf Of John Salter
 03 March 2017 15:58
 Re: [EP-tech] Easier way to do this in a citation?

Hi Andrew,
I don't think there is a easy way to do what you require in the existing epscript functions (but there is a way to add it - more on that below!).
You could alter your existing code to make the tests clearer:
  <when test="event_title and event_location and event_dates">
    <print expr="event_title" />, <print expr="event_location"/>, <print expr="event_dates"/>.
<when test="event_title and event_location">
   <print expr="event_title" />, <print expr="event_location"/>.
<when test="event_title ">
   <print expr="event_title" />.
But this does feel a bit noisy too.
To add (inject) a custom method to EPrints::Script::Compiled, see the example here: https://wiki.eprints.org/w/ORCID#Rendering_the_ORCID_in_a_citation
This keeps the added code in the repository config - and should work over upgrades (unless there's a major rewrite of EPrints::Script).
In my opinion, any files in <eprints_root>/archives/<archiveid>/cfg/cfg.d/ should be checked as part of an upgrade.
For a similar example, in White Rose Research Online, we wanted to render event dates in a more 'friendly' human way.
They are stored in the database as 'yyyy-mm-dd - yyyy-mm-dd' (or just yyyy-mm-dd if it was a one-day event), and we wanted e.g. '1-3 Mar 2017' '28 Feb - 1 Mar 2017' or '31 Dec 2016 - 2 Jan 2017'.
This https://gist.github.com/jesusbagpuss/491086533294f864de63115c66719def adds a method to EPrints::Script::Compiled that does this conversion. The citation uses:
<if test="event_dates"><print expr="wrro_human_event_dates(event_dates)"/></if>
Hope that helps!
From: eprints-tech-bounces@ecs.soton.ac.uk [mailto:eprints-tech-bounces@ecs.soton.ac.uk] On Behalf Of Andrew Collington
 03 March 2017 13:49
 [EP-tech] Easier way to do this in a citation?
Hi all,
I’ve recently had to add a few new rules in a citation for a conference proceeding.  At the moment I have a number of checks that look something like this:
<if test="is_set(event_title)">
<print expr="event_title"/>
<if test="event_location">
<if test="is_set(event_title)">, </if>
<print expr="event_location"/>
<if test="is_set(event_dates)">, <print expr="event_dates"/></if>
<if test="is_set(event_title) or is_set(event_location) or is_set(event_dates)">.</if>
So it’ll only add a comma before the location if the title is supplied, etc. and the full-stop at the end if any event details are shown.  But, well, as you can see it’s a pretty messy way to do things and wondered if there were something a little more streamlined available that would allow you to supply a list of fields and it’ll then automatically put commas between values if the values are there and a full-stop at the end if needs be?  I’m only trying to do this with cite tags in citations/eprints/default.xml.
I did see a ‘pretty_list’ function in Compiled.pm that looks like it may do something like what I want, but despite trying I didn’t see how I could pull this into a citation nor could I find any documentation on the subject.  Is that possible?
If this kind of functionality doesn’t already exist, then what’s the best course of action to adding new types of actions to cite tags?  Is it possible to create my own class do add extra actions, or should I update existing modules? (which seems like a bad idea if ever wanting to upgrade).  Is there any documentation about doing this kind of thing?
Many thanks for any advice,
Andrew Collington
Web Programmer, ITS Client Services
ITS-CS Shawcross, University of Sussex, Falmer, Brighton, BN1 9QT

T: (01273) 872591 (ext. 2591)
 *** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
*** Archive: http://www.eprints.org/tech.php/
*** EPrints community wiki: http://wiki.eprints.org/
*** EPrints developers Forum: http://forum.eprints.org/