EPrints Technical Mailing List Archive

Message: #00883


< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First

[EP-tech] Re: We are at our wits end,


Hi All,

Just to put this to bed :-)

This repository has ALL-CAPS titles. There was/is a bug in EPrints where
all-caps words ending in 's' will never be matched, because the indexed
term doesn't lose its 's' while the search term always does.

Plus, if using Xapian, terms weren't being lower-cased so a search for
"word" will never match the term "WORD".

--
Tim.

On Thu, 2012-07-26 at 10:05 +0000, Jose Martin wrote:
> I think Seb’s suggestions should have you on the right track, but in
> case it helps, and being a search that does not behave itself when a
> word ending with an ‘s’ is involved, looks like stemming might have a
> say in this.
> 
>  
> 
> As far as I know, the same stemming configuration is used for both
> searching and indexing, so re-indexing the record should have
> worked...
> 
>  
> 
> But you can anyway have a look at “cfg.d/indexing.pl” and temporarily
> place the following lines near the end of the script, just before
> “return( \@g , \@b );”
> 
>  
> 
> use Data::Dumper;
> 
> print STDERR " *** Indexing: $text\n";
> 
> print STDERR " ***    GOODs:\n";
> 
> print STDERR Dumper(\@g);
> 
> print STDERR " ***    BADs:\n";
> 
> print STDERR Dumper(\@b);
> 
>  
> 
> That will (insistently) display in the log file the words that are
> being discarded and the ones that are finally used.
> 
>  
> 
> Best,
> 
>  
> 
>                 Jose.
> 
>  
> 
> From:eprints-tech-bounces@ecs.soton.ac.uk
> [mailto:eprints-tech-bounces@ecs.soton.ac.uk] On Behalf Of Phil
> Sent: 25 July 2012 20:15
> To: eprints-tech@ecs.soton.ac.uk
> Subject: [EP-tech] Re: We are at our wits end,
> 
> 
>  
> 
> Yes I have re indexed multiple times, my confusion comes from the fact
> that the index tables are correct,   we are at 3.3.X  
> 
>  
> 
> Does the debugger in sql output to a logfile?
> 
>  
> 
> From:eprints-tech-bounces@ecs.soton.ac.uk
> [mailto:eprints-tech-bounces@ecs.soton.ac.uk] On Behalf Of sf2
> Sent: 25 July 2012 22:41
> To: eprints-tech@ecs.soton.ac.uk
> Subject: [EP-tech] Re: We are at our wits end,
> 
> 
>  
> 
> Hi Phil,
> 
> For info, which EPrints version are you using?
> 
> Worth a try:
> 
> - have you tried to reindex the eprint dataset? (bin/epadmin reindex
> <archive_id> eprint)
> 
> - have you tried debugging the search to see which SQL statement gets
> executed (**) ?
> 
> Seb
> 
> (**) to enable SQL debugging, edit
> perl_lib/EPrints/Search/Condition.pm, look at the end of "sub sql"
> for:
> 
> #print STDERR "\nsql=$sql\n\n";
> 
> Un-comment the line above, restart apache and re-run the problematic
> search - feel free to copy/paste the SQL query here so we can have a
> look.
> 
>  
> 
> On Wed, 25 Jul 2012 22:12:26 +1000, "Phil" <philpearson@iinet.net.au>
> wrote:
> 
>         If anyone can help we would very much appreciate it, we have
>         posted this issue numerous times and had someone control our
>         eprints server but the problem will not go away. 
>         
>          
>         
>          
>         
>          
>         
>          
>         
>         I will explain it step by step
>         
>          
>         
>          
>         
>          
>         
>          
>         
>         If I look at the database directly and query the index I get
>         the following result
>         
>          
>         
>          
>         
>          
>         
>          
>         
>         Image removed by sender.
>         
>          
>         
>          
>         
>          
>         
>          
>         
>         You can see the word “Green” and “Bans” are in the index
>         
>          
>         
>          
>         
>          
>         
>          
>         
>         So we should be able to find this book in the advanced or
>         simple search using the term “green bans”
>         
>          
>         
>          
>         
>          
>         
>          
>         
>         Image removed by sender.
>         
>          
>         
>          
>         
>          
>         
>          
>         
>          
>         
>          
>         
>          
>         
>         Unfortunately when we do we get only one result below
>         
>          
>         
>          
>         
>          
>         
>          
>         
>         Image removed by sender.
>         
>          
>         
>          
>         
>          
>         
>          
>         
>         Whilst this is correct, it is incomplete, the search should
>         have also returned record 3594
>         
>          
>         
>          
>         
>          
>         
>          
>         
>         If you look back at the SQL above you will see the other words
>         in the title
>         
>          
>         
>          
>         
>          
>         
>          
>         
>         If I put a few of them in the search as per below, I get the
>         correct result:
>         
>          
>         
>         Image removed by sender.
>         
>          
>         
>          
>         
>          
>         
>          
>         
>          
>         
>          
>         
>          
>         
>         Image removed by sender.
>         
>          
>         
>          
>         
>          
>         
>          
>         
>         I have erased the index record and reindexed the item to no
>         avail.
>         
>          
>         
>          
>         
>          
>         
>          
>         
>         This is not the only record that does not return from a search
>         and it makes our catalogue unusable.
>         
>          
>         
>          
>         
>          
>         
>          
>         
>         I have installed xapian but that also gives me problems as per
>         a post of mine two days ago.
>         
>          
>         
>          
>         
>          
>         
>          
>         
>         We have put a lot of effort into this migrating from the old
>         system, if we cannot fix this we will have to go through the
>         whole process again with something else.
>         
>          
>         
>         
> 
>  
> 
> 
> *** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
> *** Archive: http://www.eprints.org/tech.php/
> *** EPrints community wiki: http://wiki.eprints.org/

Attachment: signature.asc
Description: This is a digitally signed message part