EPrints Technical Mailing List Archive

See the EPrints wiki for instructions on how to join this mailing list and related information.

Message: #10170


< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First

RE: [EP-tech] DDoS on simple and advanced search


CAUTION: This e-mail originated outside the University of Southampton.

Hi All,

An aspect of the way EPrints deals with search requests may be causing us some of these issues.

What is happening:

  • Something searches your site
  • It extracts all the links from the results page – including the paginated links
  • These links are saved, and at some point (weeks/months in the future) will be requested from a network of devices
  • Each link contains both the cacheid of the original search, and all the parameters needed to re-run the search

 

When the paginated links are farmed out to the network of devices (hence the spread of IP addresses), the original EPrints search cache has expired.

Each paginated link then triggers the same original search to be run – with each request making a new cache table.

 

If the original search _expression_ returned 1000 results, presented with 20 links on each page, the follow-up crawl of those paginated links will cause 50 new individual cache tables.

 

I’ve documented it here: https://github.com/eprints/eprints3.4/issues/479 .

 

A quick short-term fix would be to stop EPrints auto-re-running a search if the search contained an old cache id.

This changes the current user experience, but I think would be better than systems becoming unresponsive.

 

NB the above has been observed using the ‘internal’ search methods (rather than Xapian/ElasticSearch).

 

Cheers,

John

 

From: eprints-tech-request@ecs.soton.ac.uk <eprints-tech-request@ecs.soton.ac.uk> On Behalf Of Yuri Carrer
Sent: 18 July 2025 07:24
To: eprints-tech@ecs.soton.ac.uk
Subject: Re: [EP-tech] DDoS on simple and advanced search

 

CAUTION: External Message. Use caution opening links and attachments.

CAUTION: This e-mail originated outside the University of Southampton.

CAUTION: This e-mail originated outside the University of Southampton.

Botnets doesn't use the same IPs.

 

The solution in simpler: rename the search script and update internal links (or let Eprints use some config for it). You can do it weekly, nobody will notice but bots won't be able to keep up with it.

 

Il 17/07/25 19:32, Tomasz Neugebauer ha scritto:

Any comments on that solution?  It seems elegant, if it works?

 

Tomasz

 

__

 
-- 
Yuri Carrer
 
 CAB - Centro di Ateneo per le Biblioteche, Università di Padova
 Tel: 049/827 9712 - Via Beato Pellegrino, 28 - Padova