EPrints Technical Mailing List Archive
See the EPrints wiki for instructions on how to join this mailing list and related information.
Message: #10172
< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First
Re: [EP-tech] DDoS on simple and advanced search
- To: "eprints-tech@ecs.soton.ac.uk" <eprints-tech@ecs.soton.ac.uk>
- Subject: Re: [EP-tech] DDoS on simple and advanced search
- From: Tomasz Neugebauer <Tomasz.Neugebauer@concordia.ca>
- Date: Tue, 22 Jul 2025 15:54:55 +0000
CAUTION: This e-mail originated outside the University of Southampton.
In addition to EPrints, one of the research projects I worked on uses a Blacklight search engine, that has been receiving hundreds of millions of queries that are also obviously bots scraping through the search. Blacklight offers a faceted search interface,
so the bots have infinite pathways to follow. It sounds like a similar situation to what we've been seeing with EPrints.
Here is a description of how the issue was addressed with a WAF by Duke University Libraries, which experienced this on their Blacklight services
"Executive Summary In May & June 2025, Duke University Libraries (DUL) staf successfully implemented Anubis, a configurable open source web application firewall (WAF), in order to stave of persistent onslaughts of AI-related bot scraping activity. During this
pilot period (May 1 - June 10, 2025), aggressive bot scraping led to extended outages for three critical library platforms (Duke Digital ..."
________________________________________________
Tomasz Neugebauer
Tel. / Tél. 514-848-2424 ext. / poste 7738
Mailing address / adresse postale: 1455 De Maisonneuve Blvd. W., LB-540-03, Montreal, Quebec H3G 1M8 library.concordia.ca From: eprints-tech-request@ecs.soton.ac.uk <eprints-tech-request@ecs.soton.ac.uk> on behalf of John Salter <J.Salter@leeds.ac.uk>
Sent: July 21, 2025 5:21 AM To: eprints-tech@ecs.soton.ac.uk <eprints-tech@ecs.soton.ac.uk> Subject: RE: [EP-tech] DDoS on simple and advanced search Attention This email originates from outside the concordia.ca domain. // Ce courriel provient de l'extérieur du domaine de concordia.ca
CAUTION: This e-mail originated outside the University of Southampton.
CAUTION: This e-mail originated outside the University of Southampton.
Hi All, An aspect of the way EPrints deals with search requests may be causing us some of these issues. What is happening:
When the paginated links are farmed out to the network of devices (hence the spread of IP addresses), the original EPrints search cache has expired. Each paginated link then triggers the same original search to be run – with each request making a new cache table.
If the original search _expression_ returned 1000 results, presented with 20 links on each page, the follow-up crawl of those paginated links will cause 50 new individual cache tables.
I’ve documented it here: https://github.com/eprints/eprints3.4/issues/479 .
A quick short-term fix would be to stop EPrints auto-re-running a search if the search contained an old cache id. This changes the current user experience, but I think would be better than systems becoming unresponsive.
NB the above has been observed using the ‘internal’ search methods (rather than Xapian/ElasticSearch).
Cheers, John
From: eprints-tech-request@ecs.soton.ac.uk <eprints-tech-request@ecs.soton.ac.uk>
On Behalf Of Yuri Carrer
CAUTION: This e-mail originated outside the University of Southampton. CAUTION: This e-mail originated outside the University of Southampton. Botnets doesn't use the same IPs.
The solution in simpler: rename the search script and update internal links (or let Eprints use some config for it). You can do it weekly, nobody will notice but bots won't be able to keep up with it.
Il 17/07/25 19:32, Tomasz Neugebauer ha scritto:
-- Yuri Carrer CAB - Centro di Ateneo per le Biblioteche, Università di Padova Tel: 049/827 9712 - Via Beato Pellegrino, 28 - Padova |
- References:
- [EP-tech] DDoS on simple search
- From: Martin Brändle <martin.braendle@uzh.ch>
- Re: [EP-tech] DDoS on simple search
- From: David R Newman <drn@ecs.soton.ac.uk>
- Re: [EP-tech] DDoS on simple and advanced search
- From: Tomasz Neugebauer <Tomasz.Neugebauer@concordia.ca>
- Re: [EP-tech] DDoS on simple and advanced search
- From: Yuri Carrer <yuri.carrer@unipd.it>
- RE: [EP-tech] DDoS on simple and advanced search
- From: John Salter <J.Salter@leeds.ac.uk>
- [EP-tech] DDoS on simple search
- Prev by Date: [EP-tech] Error with DOI Import in EPrints – "getDocumentElement" Undefined Value
- Next by Date: Re: [EP-tech] DDoS on simple and advanced search
- Previous by thread: RE: [EP-tech] DDoS on simple and advanced search
- Next by thread: Re: [EP-tech] DDoS on simple and advanced search
- Index(es):