EPrints Technical Mailing List Archive

Message: #01681


< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First

[EP-tech] Re: Eprints critical block on cache insert


The reason EPrints uses table joins is to make use of the database
optimiser. Back in v2 it used to do what you said, but that is
unscalable for many trivial cases e.g.

q: date > 1900 and author=(Smith, John)

A naive execution of this would end up creating a temp table with an
entry for every record in the database (~1 second per 1000 records),
then join that against a much smaller set of {Smith, John}. That led v2
instances to spend an age doing what looked like a simple query, but
because of some term matching the entire set it was very inefficient (as
well as hammering IO on the temp tables).

What the database optimiser is supposed to do is optimise this out by
doing the smaller set first (Smith, John) then matching that against the
larger set (>1900).

So EPrints now has a reasonable solution for the general case but can
get out of hand when given a complex query. The answer to that lies in
using Xapian, Solr or similar tool that has been designed to deal with
general textual searching.

All the best,
Tim.

On Fri, 2013-03-08 at 16:52 +0100, Jan Ploski wrote:
> Better yet, instead of producing monster (sub)queries EPrints should use 
> multiple SELECT INTOs with temporary tables to store intermediate 
> results to obtain acceptable performance in general case.
> 
> Tim Brody wrote:
> > Hi,
> >
> > These are normally coming out of a very big search query, where you get
> > a load of table joins for each term.
> >
> > Perhaps the internal search should have a sanity check to throw an error
> > (or truncate) if too many terms are sent at it.
> >
> > /Tim.
> >
> > On Fri, 2013-03-08 at 13:56 +0100, Paolo Tealdi wrote:
> >> On 03/08/2013 01:05 PM, Lee Paton wrote:
> >>
> >> Hi Lee.
> >>
> >> No. I changed again the global parameter setting it to 0 and mysql is working like a charme since the changing. Considering the freezing frequency, i think that
> >> it should already have happened at least one time. I controlled the search a few times and the system answers more or less in 8-9 seconds.
> >>
> >>
> >> Best regards,
> >> Paolo Tealdi
> >>
> >>
> >>
> >>> Hi Paolo
> >>>
> >>> I've been experiencing the same problem and your solution has solved it on our dev system
> >>>
> >>> Have you run into any issues on your server since you made the change?
> >>>
> >>> Thanks
> >>>
> >>> Lee
> >>>
> >>> Lee Paton
> >>> Information Services
> >>> Cardiff University
> >>> 40-42 Park Place
> >>> Cardiff
> >>> CF10 3BB
> >>>
> >>>
> >>>
> >>> From: Paolo Tealdi<paolo.tealdi@polito.it>
> >>> To: "<eprints-tech@ecs.soton.ac.uk>"<eprints-tech@ecs.soton.ac.uk>
> >>> Date: 12/02/2013 08:38
> >>> Subject: [EP-tech]  Eprints critical block on cache insert
> >>> Sent by: eprints-tech-bounces@ecs.soton.ac.uk
> >>> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> >>>
> >>>
> >>>
> >>> Dear all,
> >>>
> >>> i'm debugging some misterious mysql hanging that sometimes happen on our server (2-3 times a week). They happen during the night and when i can check the server i find the eprint server almost swaped out with a
> >>> VERY big number of httpd process, all the mysql connection used  and tipically a huge cache insert (the attachment is the last insert found) apparently blocking the mysql server.
> >>> Doing a SHOW FULL PROCESSLIST\G i found the older active thread
> >>>
> >>>        Id: 961782
> >>>      User: eprints
> >>>      Host: giasone.polito.it:35530
> >>>        db: eprints3310
> >>> Command: Query
> >>>      Time: 50077
> >>>     State: statistics
> >>>      Info:  ... the select in attachment ...
> >>>
> >>> You can notice that the process state is in "statistics" status.
> >>>
> >>> Googling i found that it seems that, if in a select there are involved a big number of tables, mysql query optimizer can block itself indefinitely analyzing that transaction, in the statistic state.
> >>> It seems that setting  optimizer_search_depth variable  to a low value this problem disappear.
> >>>
> >>> The default is
> >>>
> >>> optimizer_search_depth = 62
> >>>
> >>> i put it to
> >>>
> >>> set global optimizer_search_depth = 5;
> >>>
> >>> and seems to resolve the blocking issue.
> >>> A side effect seems to be a slight speedup in general when you're doing advanced searches (those query can be "important") .
> >>>
> >>> Anybody has found this problem ?
> >>>
> >>> Best regards,
> >>> Paolo Tealdi
> >>>
> >>> --
> >>> Ing. Paolo Tealdi         Area IT - Politecnico Torino
> >>> Telefono/Phone : +39-011-0906714 , FAX : +39-011-0906799
> >>> Indirizzo/Address : C.so Duca degli Abruzzi,  24 - 10129 Torino - ITALY
> >>> Skype : tealdi.paolo
> >>> Please consider your environmental responsibility before printing this e-mail
> >>> [attachment "select_bloccata_07022013.sql" deleted by Lee Paton/scolgp/CardiffUniversity] *** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
> >>> *** Archive: http://www.eprints.org/tech.php/
> >>> *** EPrints community wiki: http://wiki.eprints.org/
> >>>
> >>>
> >>>
> >>> *** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
> >>> *** Archive: http://www.eprints.org/tech.php/
> >>> *** EPrints community wiki: http://wiki.eprints.org/
> >>>
> >>
> >>
> >
> >
> >
> > *** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
> > *** Archive: http://www.eprints.org/tech.php/
> > *** EPrints community wiki: http://wiki.eprints.org
> *** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
> *** Archive: http://www.eprints.org/tech.php/
> *** EPrints community wiki: http://wiki.eprints.org/

Attachment: signature.asc
Description: This is a digitally signed message part