EPrints Technical Mailing List Archive

See the EPrints wiki for instructions on how to join this mailing list and related information.

Message: #10183

RE: [EP-tech] Preventing old cache searches automatically re-searching

To: "eprints-tech@ecs.soton.ac.uk" <eprints-tech@ecs.soton.ac.uk>
Subject: RE: [EP-tech] Preventing old cache searches automatically re-searching
From: John Salter <J.Salter@leeds.ac.uk>
Date: Fri, 25 Jul 2025 08:33:06 +0000

CAUTION: This e-mail originated outside the University of Southampton.

Hi Martin,

I think that there will be many layers of ‘solution’ to this problem, and I agree that WAFs (and/or things like Cloudflare’s Turnstile or Anubis) should be part of a ‘well deployed instance’ of EPrints (and most software).

You may be lucky enough to be (or have access to) experts in some of these things.

Others don’t – and implementing e.g. mod_security or deploying additional services around the repository can be expensive, time-consuming and possibly not an option.

Taking the current DDoS aspect away, my question is:

How should EPrints handle a request for an obsolete search result set?

My suggested changes to the code allow this to be a decision taken by the repository manager.

Cheers,

John

From: eprints-tech-request@ecs.soton.ac.uk <eprints-tech-request@ecs.soton.ac.uk> On Behalf Of Martin Brändle
Sent: 25 July 2025 08:37
To: eprints-tech@ecs.soton.ac.uk
Subject: Re: [EP-tech] Preventing old cache searches automatically re-searching

CAUTION: External Message. Use caution opening links and attachments.

CAUTION: This e-mail originated outside the University of Southampton.

Hi John,

frankly, (and given that we had millions of requests of this type some weeks ago), we think it’s the wrong approach.

The application shouldn’t handle such requests at all (then it’s already too late), but a firewall in front of it. The approach described by Tomasz is an interesting option.

Kind regards,

Martin

Dr. Martin Brändle
Zentrale Informatik
Universität Zürich
Pfingstweidstrasse 60B
CH-8005 Zürich

From: eprints-tech-request@ecs.soton.ac.uk <eprints-tech-request@ecs.soton.ac.uk> on behalf of John Salter <J.Salter@leeds.ac.uk>
Date: Thursday, 24 July 2025 at 18:57
To: eprints-tech@ecs.soton.ac.uk <eprints-tech@ecs.soton.ac.uk>
Subject: [EP-tech] Preventing old cache searches automatically re-searching

CAUTION: This e-mail originated outside the University of Southampton.

Hi,
One of the aspects of the current DDoS traffic are requests for search results pages a long time after their cached results have been removed from the system.

I have documented this here:

https://github.com/eprints/eprints3.4/issues/479

and a possible (not ready for production yet!) fix here:

https://github.com/jesusbagpuss/eprints3.4/tree/iss-479

If a search URL has a cache parameter, and that cache no longer exists, and the repo config has this config option (see ~/lib/cfg.d/misc.pl):
$c->{cache_not_found_no_search} = 1;

Then rather than the search being automatically re-run, a ‘search cache not found’ page is presented with a URL that a user could paste into their browser.

Including a clickable link might just make the DDoS follow the link in the future. The URL uses some _javascript_ to construct the URL.

A normal search results page can include many links for both paginated results and different export formats (view page source to see these). My observations show that these are all followed by the DDoS bot swarms.

If you are in a position to try the changes in https://github.com/jesusbagpuss/eprints3.4/tree/iss-479 on a test repository, I’d be most grateful.

To test:-

merge the code
make sure

$c->{cache_not_found_no_search} = 1;

Is set

Test/Restart apache
Run a search that returns multiple pages
Navigate to the second page
In the URL, change the ‘cache’ parameter to a different number e.g. cache=12345 to cache=999999999
You should get a page with a couple of URLs that can be cut-and-paste to re-run the search.

Please let me know if you find any issues with this approach.

Cheers,

John

John Salter

https://orcid.org/0000-0002-8611-8266

White Rose Libraries Technical Officer
Library and Research Management team, IT
University of Leeds

References:
- [EP-tech] Preventing old cache searches automatically re-searching
  - From: John Salter <J.Salter@leeds.ac.uk>
- Re: [EP-tech] Preventing old cache searches automatically re-searching
  - From: Martin Brändle <martin.braendle@uzh.ch>

Prev by Date: Re: [EP-tech] Preventing old cache searches automatically re-searching
Next by Date: [EP-tech] Search interface
Previous by thread: Re: [EP-tech] Preventing old cache searches automatically re-searching
Next by thread: [EP-tech] Search interface
Index(es):
- Date
- Thread