EPrints Technical Mailing List Archive

Message: #08998


< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First

Re: [EP-tech] Limit Export-search-results (max_items for export)


A robots.txt would stop well behaved crawlers.


On 06/07/2022 08:44, Yuri via Eprints-tech wrote:
CAUTION: This e-mail originated outside the University of Southampton.

A solution could be to add some js to the export button click to submit the form. Crawlers will not run it, while browsers will.

Il 06/07/22 09:22, Stenger, Avischai via Eprints-tech ha scritto:
CAUTION: This e-mail originated outside the University of Southampton.
Hi David,

You got it. This is exactly our problem. Some crawlers ask for an export that contains a very large number of records, which brings our HW to its knees.

Regards,


Avischai

Am 05.07.2022 um 16:58 schrieb David R Newman via Eprints-tech <eprints-tech@ecs.soton.ac.uk>:

Hi Avischai,

Unfortunately, I don't think there is a way of limiting the number of records that can be exported.  I think the consideration at the time was that browse view web pages with loads of items can take a long time to load (even when cached) and they are not particularly useful to a user with their web browser as the page will be really long, (i.e. take forever to scroll through).  So rather than putting load on the server to generate such a web page it easier just to say, "this page has too many items to display".  The opposite is true with exports, which are typically machine-readable and therefore either used for some automated analysis or post-processed (e.g. truncated to only the first n items) before being displayed to a real user.  If an export itself was truncated or restricted if it had what was determined "too many items", this would then prevent or render the analysis/post-processing useless.  I am not sure what other people's thoughts are about this?

I think I may appreciate what might be your more general point, which is the high processing cost of generating these large exports.  If you have some crawler going through your browse views and asking for every export format for some of these really long listings of items, it can put quite some load on the server, (/cgi/exportview is not cached).  Sometimes, there can be multiple connections (maybe even 20+) from the same IP address trying to request view listing exports.   I have observed crawlers doing this on a number of EPrints repositories and have had to resort to blocking the IP addresses, at least temporarily.  We have been considering for future version of EPrints, if there is a way of restricting the number of requests that can make for processor-intensive pages over a set period of time:

https://github.com/eprints/eprints3.4/issues/102

Regards

David Newman

On 05/07/2022 3:28 pm, Stenger, Avischai via Eprints-tech wrote:
CAUTION: This e-mail originated outside the University of Southampton.

Hi,

I can limit the "max of founded Records" with „max_items“ in views.pl , but it looks like there is no limit for "export founded records“

So as I search after „roman“ and get the message "The number of items (7) for this view has exceeded system limits (6). The system administrator either needs to increase "max_items" or apply additional filters to this view.“

I can still klick on this Message-page on „export“ and get all the records. Is there a way to limit the permitted size (count)  of records for the export?


Regards & Tnks

*** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
*** Archive: http://www.eprints.org/tech.php/
*** EPrints community wiki: http://wiki.eprints.org/
*** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
*** Archive: http://www.eprints.org/tech.php/
*** EPrints community wiki: http://wiki.eprints.org/


*** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
*** Archive: http://www.eprints.org/tech.php/
*** EPrints community wiki: http://wiki.eprints.org/

*** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
*** Archive: http://www.eprints.org/tech.php/
*** EPrints community wiki: http://wiki.eprints.org/
--

Christopher Gutteridge <totl@soton.ac.uk>
You should read our team blog at http://blog.soton.ac.uk/webteam/

Industrial Action

Sadly my trade union is currently in dispute over pay, pensions and casualisation. You can read more at https://www.ucu.org.uk/article/11896/Why-were-taking-action

The Southampton branch is currently working on "Action Short Of a Strike" (ASOS). This means only doing work we are contracted to do, so no working on any additional voluntary tasks. It's frustating, but so are below inflation pay rises.

As a result, so far I've had to turn down or stop working on:

  • Coordinating the iSolutions Communities of Practice program
  • Coordinating the System Documentation Community of Pracice
  • Helping with a workshop on data visualisation
  • Providing a Minecraft activity for the Archaeology family day
  • Helping another team recruit someone for a post
  • Not helped a colleague debug something in a service I'm an expert on but is no longer my responsibility
  • Not offering to "keep an eye" on changes impacting our systems while I'm on holiday

I look forward to getting back into these kinds of activity as soon as the industrial action permits.

Please do not cover for people taking ASOS. If it causes problems, it is helpful to make management aware. The most unhelpful thing is for people to mitigate the impacts of industrial action or hide it from management. The best thing to help is to join the union and the action and/or donate to the strike fund.