EPrints Technical Mailing List Archive

Message: #00882

< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First

[EP-tech] Re: We are at our wits end,

I think Seb’s suggestions should have you on the right track, but in case it helps, and being a search that does not behave itself when a word ending with an ‘s’ is involved, looks like stemming might have a say in this.


As far as I know, the same stemming configuration is used for both searching and indexing, so re-indexing the record should have worked...


But you can anyway have a look at “cfg.d/indexing.pl” and temporarily place the following lines near the end of the script, just before “return( \@g , \@b );”


use Data::Dumper;

print STDERR " *** Indexing: $text\n";

print STDERR " ***    GOODs:\n";

print STDERR Dumper(\@g);

print STDERR " ***    BADs:\n";

print STDERR Dumper(\@b);


That will (insistently) display in the log file the words that are being discarded and the ones that are finally used.






From: eprints-tech-bounces@ecs.soton.ac.uk [mailto:eprints-tech-bounces@ecs.soton.ac.uk] On Behalf Of Phil
Sent: 25 July 2012 20:15
To: eprints-tech@ecs.soton.ac.uk
Subject: [EP-tech] Re: We are at our wits end,


Yes I have re indexed multiple times, my confusion comes from the fact that the index tables are correct,   we are at 3.3.X 


Does the debugger in sql output to a logfile?


From: eprints-tech-bounces@ecs.soton.ac.uk [mailto:eprints-tech-bounces@ecs.soton.ac.uk] On Behalf Of sf2
Sent: 25 July 2012 22:41
To: eprints-tech@ecs.soton.ac.uk
Subject: [EP-tech] Re: We are at our wits end,


Hi Phil,

For info, which EPrints version are you using?

Worth a try:

- have you tried to reindex the eprint dataset? (bin/epadmin reindex <archive_id> eprint)

- have you tried debugging the search to see which SQL statement gets executed (**) ?


(**) to enable SQL debugging, edit perl_lib/EPrints/Search/Condition.pm, look at the end of "sub sql" for:

#print STDERR "\nsql=$sql\n\n";

Un-comment the line above, restart apache and re-run the problematic search - feel free to copy/paste the SQL query here so we can have a look.


On Wed, 25 Jul 2012 22:12:26 +1000, "Phil" <philpearson@iinet.net.au> wrote:

If anyone can help we would very much appreciate it, we have posted this issue numerous times and had someone control our eprints server but the problem will not go away. 





I will explain it step by step





If I look at the database directly and query the index I get the following result





Image removed by sender.





You can see the word “Green” and “Bans” are in the index





So we should be able to find this book in the advanced or simple search using the term “green bans”





Image removed by sender.








Unfortunately when we do we get only one result below





Image removed by sender.





Whilst this is correct, it is incomplete, the search should have also returned record 3594





If you look back at the SQL above you will see the other words in the title





If I put a few of them in the search as per below, I get the correct result:


Image removed by sender.








Image removed by sender.





I have erased the index record and reindexed the item to no avail.





This is not the only record that does not return from a search and it makes our catalogue unusable.





I have installed xapian but that also gives me problems as per a post of mine two days ago.





We have put a lot of effort into this migrating from the old system, if we cannot fix this we will have to go through the whole process again with something else.