EPrints Technical Mailing List Archive

See the EPrints wiki for instructions on how to join this mailing list and related information.

Message: #10172


< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First

Re: [EP-tech] DDoS on simple and advanced search


CAUTION: This e-mail originated outside the University of Southampton.
In addition to EPrints, one of the research projects I worked on uses a Blacklight search engine, that has been receiving hundreds of millions of queries that are also obviously bots scraping through the search.  Blacklight offers a faceted search interface, so the bots have infinite pathways to follow. It sounds like a similar situation to what we've been seeing with EPrints.  
Here is a description of how the issue was addressed with a WAF by Duke University Libraries, which experienced this on their Blacklight services 
"Executive Summary In May & June 2025, Duke University Libraries (DUL) staf successfully implemented Anubis, a configurable open source web application firewall (WAF), in order to stave of persistent onslaughts of AI-related bot scraping activity. During this pilot period (May 1 - June 10, 2025), aggressive bot scraping led to extended outages for three critical library platforms (Duke Digital ..."


Tomasz


________________________________________________

Tomasz Neugebauer
Senior Librarian | Bibliothécaire titulaire
Digital Projects & Systems Development Librarian / Bibliothécaire des Projets Numériques & Développement de Systèmes
Concordia University / Université Concordia

Tel. / Tél. 514-848-2424 ext. / poste 7738
Email / courriel:
tomasz.neugebauer@concordia.ca

Mailing address / adresse postale: 1455 De Maisonneuve Blvd. W., LB-540-03, Montreal, Quebec H3G 1M8
Street address / adresse municipale: 1400 De Maisonneuve Blvd. W., LB-540-03, Montreal, Quebec H3G 1M8

library.concordia.ca


From: eprints-tech-request@ecs.soton.ac.uk <eprints-tech-request@ecs.soton.ac.uk> on behalf of John Salter <J.Salter@leeds.ac.uk>
Sent: July 21, 2025 5:21 AM
To: eprints-tech@ecs.soton.ac.uk <eprints-tech@ecs.soton.ac.uk>
Subject: RE: [EP-tech] DDoS on simple and advanced search
 

Attention This email originates from outside the concordia.ca domain. // Ce courriel provient de l'extérieur du domaine de concordia.ca




CAUTION: This e-mail originated outside the University of Southampton.
CAUTION: This e-mail originated outside the University of Southampton.

Hi All,

An aspect of the way EPrints deals with search requests may be causing us some of these issues.

What is happening:

  • Something searches your site
  • It extracts all the links from the results page – including the paginated links
  • These links are saved, and at some point (weeks/months in the future) will be requested from a network of devices
  • Each link contains both the cacheid of the original search, and all the parameters needed to re-run the search

 

When the paginated links are farmed out to the network of devices (hence the spread of IP addresses), the original EPrints search cache has expired.

Each paginated link then triggers the same original search to be run – with each request making a new cache table.

 

If the original search _expression_ returned 1000 results, presented with 20 links on each page, the follow-up crawl of those paginated links will cause 50 new individual cache tables.

 

I’ve documented it here: https://github.com/eprints/eprints3.4/issues/479 .

 

A quick short-term fix would be to stop EPrints auto-re-running a search if the search contained an old cache id.

This changes the current user experience, but I think would be better than systems becoming unresponsive.

 

NB the above has been observed using the ‘internal’ search methods (rather than Xapian/ElasticSearch).

 

Cheers,

John

 

From: eprints-tech-request@ecs.soton.ac.uk <eprints-tech-request@ecs.soton.ac.uk> On Behalf Of Yuri Carrer
Sent: 18 July 2025 07:24
To: eprints-tech@ecs.soton.ac.uk
Subject: Re: [EP-tech] DDoS on simple and advanced search

 

CAUTION: External Message. Use caution opening links and attachments.

CAUTION: This e-mail originated outside the University of Southampton.

CAUTION: This e-mail originated outside the University of Southampton.

Botnets doesn't use the same IPs.

 

The solution in simpler: rename the search script and update internal links (or let Eprints use some config for it). You can do it weekly, nobody will notice but bots won't be able to keep up with it.

 

Il 17/07/25 19:32, Tomasz Neugebauer ha scritto:

Any comments on that solution?  It seems elegant, if it works?

 

Tomasz

 

__

 
-- 
Yuri Carrer
 
 CAB - Centro di Ateneo per le Biblioteche, Università di Padova
 Tel: 049/827 9712 - Via Beato Pellegrino, 28 - Padova