EPrints Technical Mailing List Archive

Message: #07497


< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First

Re: [EP-tech] Does a proliferation of cache tables effect simple search performance?


Hi John,

 

Thank you very much for sharing this information – this is super-useful to know!

 

Have a nice day,

Michele

 

From: John Salter [mailto:J.Salter@leeds.ac.uk]
Sent: 26 September 2018 11:41
To: eprints-tech@ecs.soton.ac.uk; Michele Morelli <Michele.Morelli@cosector.com>
Subject: RE: [EP-tech] Does a proliferation of cache tables effect simple search performance?

 

Hi Michele,

There are a couple of config variables that might be worth looking for in the archive config:

cache_maxlife (maximum age of a cache table)

cache_max (number of cache tables to keep)

 

If these aren't set, then it might mean cache tables aren't considered for removal.

 

They *should* get cleaned up by - EPrints::DataObj::Cachemap::cleanup, which is called as part of the Apache cleanup process - registered in EPrints::DataObj::Cachemap::create_from_data.

 

Each cache tables should have a reference within the cachemap table. If this isn't the case, then they are 'proper' orphans, and may need to be removed manually.

If you have a block of orphaned tables that are all from a similar date/time it may point to an issue at some point e.g. an Apache crash that resulted in the cleanup not happening properly.

 

It could also be that under Apache2.4  (or if you're running an 'interesting' config with other elements - fcgi or similar) that the cleanup phase isn't acted upon.

 

In general terms, EPrints isn't affected by the existence of lots of cache tables.

The database server might be impacted - if there are 1,000's of them - and it's trying to keep things in memory etc.

 

Hope that helps a bit - let me know if you have more questions!

 

Cheers,

John

 

 

From: eprints-tech-bounces@ecs.soton.ac.uk [mailto:eprints-tech-bounces@ecs.soton.ac.uk] On Behalf Of Michele Morelli via Eprints-tech
Sent: 26 September 2018 11:07
To: eprints-tech@ecs.soton.ac.uk
Subject: [EP-tech] Does a proliferation of cache tables effect simple search performance?

 

Good morning All,

 

I was wondering whether anyone else incurred into this aspect of Eprints’ Simple Search, and whether this is the intended way it should work.

 

Every time a simple search is performed, Eprints creates a new cache table in the database – these tables are named with a ‘cache[0-9]+’ format.  

Eprints does not appear consider these cache tables as orphaned, and therefore these tables seem to remain in the database – it may happen that a large multitude of these tables are left in the database.

 

I would be curious to know more about these tables:

1 – Are these cache tables eventually supposed to be dropped by any process? ‘epadmin cleanup_cachemaps’ leaves these tables untouched, as it does not consider them as orphaned;

2 – Does a proliferation of cache tables impact the simple search performance? Having overabundance of cache tables to look through seems to me that can bring unwanted overhead to DB-related processes, but I might be missing something obvious;

 

Thank you – have a nice day!

Michele