EPrints Technical Mailing List Archive

Message: #02710


< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First

[EP-tech] Re: Simple search - length of query


I ran it on 5.1.72 with optimizer_search_depth=0 (although I also tried
with default - no problems). The query takes 1s on a cold db here. My
cardinalities look different, of course, but not extraordinarily so
(35,416801,5418424,80872,677303).

John Salter wrote:
> Hi Jan/Seb,
> It looks like my indexes are the same as yours (see below).
> Which version of MySQL are you running? What is your ' optimizer_search_depth' set to?
> 
> Seb,
> When I took a snapshot of the process', the query was in the 'statistics' stage - for a long time - so I'll see if tweaking the optimizer_search_depth helps!
> Longer term we'll be switching to Xapian :o)
> 
> Cheers,
> John
> 
> 
> mysql> SHOW INDEXES FROM eprint__rindex\G
> *************************** 1. row ***************************
>        Table: eprint__rindex
>   Non_unique: 0
>     Key_name: PRIMARY
> Seq_in_index: 1
>  Column_name: field
>    Collation: A
>  Cardinality: 49
>     Sub_part: NULL
>       Packed: NULL
>         Null:
>   Index_type: BTREE
>      Comment:
> *************************** 2. row ***************************
>        Table: eprint__rindex
>   Non_unique: 0
>     Key_name: PRIMARY
> Seq_in_index: 2
>  Column_name: word
>    Collation: A
>  Cardinality: 1238267
>     Sub_part: NULL
>       Packed: NULL
>         Null:
>   Index_type: BTREE
>      Comment:
> *************************** 3. row ***************************
>        Table: eprint__rindex
>   Non_unique: 0
>     Key_name: PRIMARY
> Seq_in_index: 3
>  Column_name: eprintid
>    Collation: A
>  Cardinality: 14859215
>     Sub_part: NULL
>       Packed: NULL
>         Null:
>   Index_type: BTREE
>      Comment:
> *************************** 4. row ***************************
>        Table: eprint__rindex
>   Non_unique: 1
>     Key_name: eprint__rindex_eprintid_2
> Seq_in_index: 1
>  Column_name: eprintid
>    Collation: A
>  Cardinality: 15994
>     Sub_part: NULL
>       Packed: NULL
>         Null:
>   Index_type: BTREE
>      Comment:
> *************************** 5. row ***************************
>        Table: eprint__rindex
>   Non_unique: 1
>     Key_name: eprint__rindex_eprintid_2
> Seq_in_index: 2
>  Column_name: field
>    Collation: A
>  Cardinality: 154783
>     Sub_part: NULL
>       Packed: NULL
>         Null:
>   Index_type: BTREE
>      Comment:
> 5 rows in set (0.00 sec)
> 
> -----Original Message-----
> From: eprints-tech-bounces@ecs.soton.ac.uk [mailto:eprints-tech-bounces@ecs.soton.ac.uk] On Behalf Of Jan Ploski
> Sent: 05 March 2014 12:20
> To: eprints-tech@ecs.soton.ac.uk
> Cc: John Salter
> Subject: [EP-tech] Re: Simple search - length of query
> 
> Could it be that you're missing some essential indexes on eprint__rindex?
> 
> Your example query is very hard on my 3.0.5 database while it returns
> quickly on 3.3.12. The difference is that the eprint__rindex primary key
> in 3.3.12 consists of (field, word, eprintid) and there is also a
> non-unique index on (eprintid, field). In the 3.0.5 database I only have
> a non-unique index on eprintid.
> 
> John Salter wrote:
>> Hi,
>> Yesterday we had a bit of an issue with our repository when someone pasted a full citation into the simple search box.
>> This produced an impressive SQL query that locked things up and made users unhappy...
>>
>> Is there a way to sanitise what a 'simple' search might try to handle? e.g. would restricting it to a certain number of words be acceptable?
>> Would the Xapian search handle a request like the one below any better?
>>
>> Details below/attached if you're interested!
>> Cheers,
>> John
>>
>> GET /cgi/search/simple?full=%E2%80%98Families%2C+Domesticity+and+Intimacy%3A+Changing+Relationships+in+Changi
>> ng+Times%E2%80%99%2C+in+Richardson%2C+D%2C+and+Robinson%2C+V.+%28eds%29+Introducing+Women%27s+Studies%2C+third+edition.+Basingstoke%3A+Palgrave%2C+2008+pp.+1
>> 25-143.+&_action_search=Search&_order=bytitle&basic_srchtype=ALL&_satisfyall=ALL
>>
>> searchexp created in cache table:
>> 0|1|-date/creators_name/title|archive|-|full:abstract/creators_name/date/documents/title:ALL:IN:▒Families, Domesticity and Intimacy%3A Changing Relationships in Changing Times▒, in Richardson, D, and Robinson, V. (eds) Introducing Women's Studies, third edition. Basingstoke%3A Palgrave, 2008 pp. 125-143. |-|eprint_status:eprint_status:ANY:EQ:archive|metadata_visibility:metadata_visibility:ANY:EQ:show
>>
>> The SQL generated by search is attached (get ready for this - it's a thing of beauty ;o) - you can see why it took a while to run!
>>
>>
>>
>>
>> *** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
>> *** Archive: http://www.eprints.org/tech.php/
>> *** EPrints community wiki: http://wiki.eprints.org/
>> *** EPrints developers Forum: http://forum.eprints.org/
> *** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
> *** Archive: http://www.eprints.org/tech.php/
> *** EPrints community wiki: http://wiki.eprints.org/
> *** EPrints developers Forum: http://forum.eprints.org/
> 
> *** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
> *** Archive: http://www.eprints.org/tech.php/
> *** EPrints community wiki: http://wiki.eprints.org/
> *** EPrints developers Forum: http://forum.eprints.org/