EPrints Technical Mailing List Archive

See the EPrints wiki for instructions on how to join this mailing list and related information.

Message: #10183


< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First

RE: [EP-tech] Preventing old cache searches automatically re-searching


CAUTION: This e-mail originated outside the University of Southampton.

Hi Martin,

I think that there will be many layers of ‘solution’ to this problem, and I agree that WAFs (and/or things like Cloudflare’s Turnstile or Anubis) should be part of a ‘well deployed instance’ of EPrints (and most software).

 

You may be lucky enough to be (or have access to) experts in some of these things.

Others don’t – and implementing e.g. mod_security or deploying additional services around the repository can be expensive, time-consuming and possibly not an option.

 

Taking the current DDoS aspect away, my question is:

How should EPrints handle a request for an obsolete search result set?

 

My suggested changes to the code allow this to be a decision taken by the repository manager.

 

Cheers,

John

 

 

From: eprints-tech-request@ecs.soton.ac.uk <eprints-tech-request@ecs.soton.ac.uk> On Behalf Of Martin Brändle
Sent: 25 July 2025 08:37
To: eprints-tech@ecs.soton.ac.uk
Subject: Re: [EP-tech] Preventing old cache searches automatically re-searching

 

CAUTION: External Message. Use caution opening links and attachments.

CAUTION: This e-mail originated outside the University of Southampton.

CAUTION: This e-mail originated outside the University of Southampton.

Hi John,

 

frankly, (and given that we had millions of requests of this type some weeks ago),  we think it’s the wrong approach.

The application shouldn’t handle such requests at all (then it’s already too late), but a firewall in front of it. The approach described by Tomasz is an interesting option.

 

Kind regards,

 

Martin

 

--

Dr. Martin Brändle
Zentrale Informatik
Universität Zürich
Pfingstweidstrasse 60B
CH-800
5 Zürich

 

 

 

From: eprints-tech-request@ecs.soton.ac.uk <eprints-tech-request@ecs.soton.ac.uk> on behalf of John Salter <J.Salter@leeds.ac.uk>
Date: Thursday, 24 July 2025 at 18:57
To: eprints-tech@ecs.soton.ac.uk <eprints-tech@ecs.soton.ac.uk>
Subject: [EP-tech] Preventing old cache searches automatically re-searching

CAUTION: This e-mail originated outside the University of Southampton.

CAUTION: This e-mail originated outside the University of Southampton.

Hi,
One of the aspects of the current DDoS traffic are requests for search results pages a long time after their cached results have been removed from the system.

 

I have documented this here:

https://github.com/eprints/eprints3.4/issues/479

and a possible (not ready for production yet!) fix here:

https://github.com/jesusbagpuss/eprints3.4/tree/iss-479

 

If a search URL has a cache parameter, and that cache no longer exists, and the repo config has this config option (see ~/lib/cfg.d/misc.pl):
$c->{cache_not_found_no_search} = 1;

Then rather than the search being automatically re-run, a ‘search cache not found’ page is presented with a URL that a user could paste into their browser.

 

Including a clickable link might just make the DDoS follow the link in the future. The URL uses some _javascript_ to construct the URL.

 

A normal search results page can include many links for both paginated results and different export formats (view page source to see these). My observations show that these are all followed by the DDoS bot swarms.

 

If you are in a position to try the changes in https://github.com/jesusbagpuss/eprints3.4/tree/iss-479 on a test repository, I’d be most grateful.

To test:-

  1. merge the code
  2. make sure

$c->{cache_not_found_no_search} = 1;

                Is set

  1. Test/Restart apache
  2. Run a search that returns multiple pages
  3. Navigate to the second page
  4. In the URL, change the ‘cache’ parameter to a different number e.g. cache=12345 to cache=999999999
  5. You should get a page with a couple of URLs that can be cut-and-paste to re-run the search.

 

Please let me know if you find any issues with this approach.

 

Cheers,

John

 

John Salter

https://orcid.org/0000-0002-8611-8266

 

White Rose Libraries Technical Officer
Library and Research Management team, IT
University of Leeds