[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[EP-tech] Re: We are at our wits end,



I think Seb?s suggestions should have you on the right track, but in case it helps, and being a search that does not behave itself when a word ending with an ?s? is involved, looks like stemming might have a say in this.

As far as I know, the same stemming configuration is used for both searching and indexing, so re-indexing the record should have worked...

But you can anyway have a look at ?cfg.d/indexing.pl? and temporarily place the following lines near the end of the script, just before ?return( \@g , \@b );?

use Data::Dumper;
print STDERR " *** Indexing: $text\n";
print STDERR " ***    GOODs:\n";
print STDERR Dumper(\@g);
print STDERR " ***    BADs:\n";
print STDERR Dumper(\@b);

That will (insistently) display in the log file the words that are being discarded and the ones that are finally used.

Best,

                Jose.

From: eprints-tech-bounces at ecs.soton.ac.uk<mailto:eprints-tech-bounces at ecs.soton.ac.uk> [mailto:eprints-tech-bounces at ecs.soton.ac.uk] On Behalf Of Phil
Sent: 25 July 2012 20:15
To: eprints-tech at ecs.soton.ac.uk<mailto:eprints-tech at ecs.soton.ac.uk>
Subject: [EP-tech] Re: We are at our wits end,

Yes I have re indexed multiple times, my confusion comes from the fact that the index tables are correct,   we are at 3.3.X

Does the debugger in sql output to a logfile?

From: eprints-tech-bounces at ecs.soton.ac.uk<mailto:eprints-tech-bounces at ecs.soton.ac.uk> [mailto:eprints-tech-bounces at ecs.soton.ac.uk] On Behalf Of sf2
Sent: 25 July 2012 22:41
To: eprints-tech at ecs.soton.ac.uk<mailto:eprints-tech at ecs.soton.ac.uk>
Subject: [EP-tech] Re: We are at our wits end,


Hi Phil,

For info, which EPrints version are you using?

Worth a try:

- have you tried to reindex the eprint dataset? (bin/epadmin reindex <archive_id> eprint)

- have you tried debugging the search to see which SQL statement gets executed (**) ?

Seb

(**) to enable SQL debugging, edit perl_lib/EPrints/Search/Condition.pm, look at the end of "sub sql" for:

#print STDERR "\nsql=$sql\n\n";

Un-comment the line above, restart apache and re-run the problematic search - feel free to copy/paste the SQL query here so we can have a look.



On Wed, 25 Jul 2012 22:12:26 +1000, "Phil" <philpearson at iinet.net.au<mailto:philpearson at iinet.net.au>> wrote:
If anyone can help we would very much appreciate it, we have posted this issue numerous times and had someone control our eprints server but the problem will not go away.







I will explain it step by step







If I look at the database directly and query the index I get the following result







[Image removed by sender.]







You can see the word ?Green? and ?Bans? are in the index







So we should be able to find this book in the advanced or simple search using the term ?green bans?







[Image removed by sender.]












Unfortunately when we do we get only one result below







[Image removed by sender.]







Whilst this is correct, it is incomplete, the search should have also returned record 3594







If you look back at the SQL above you will see the other words in the title







If I put a few of them in the search as per below, I get the correct result:


[Image removed by sender.]












[Image removed by sender.]







I have erased the index record and reindexed the item to no avail.







This is not the only record that does not return from a search and it makes our catalogue unusable.







I have installed xapian but that also gives me problems as per a post of mine two days ago.







We have put a lot of effort into this migrating from the old system, if we cannot fix this we will have to go through the whole process again with something else.




-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ecs.soton.ac.uk/pipermail/eprints-tech/attachments/20120726/957c6dc7/attachment-0001.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.jpg
Type: image/jpeg
Size: 2117 bytes
Desc: image001.jpg
Url : http://mailman.ecs.soton.ac.uk/pipermail/eprints-tech/attachments/20120726/957c6dc7/attachment-0005.jpg 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.jpg
Type: image/jpeg
Size: 1952 bytes
Desc: image002.jpg
Url : http://mailman.ecs.soton.ac.uk/pipermail/eprints-tech/attachments/20120726/957c6dc7/attachment-0006.jpg 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image003.jpg
Type: image/jpeg
Size: 4017 bytes
Desc: image003.jpg
Url : http://mailman.ecs.soton.ac.uk/pipermail/eprints-tech/attachments/20120726/957c6dc7/attachment-0007.jpg 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image004.jpg
Type: image/jpeg
Size: 1580 bytes
Desc: image004.jpg
Url : http://mailman.ecs.soton.ac.uk/pipermail/eprints-tech/attachments/20120726/957c6dc7/attachment-0008.jpg 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image005.jpg
Type: image/jpeg
Size: 3896 bytes
Desc: image005.jpg
Url : http://mailman.ecs.soton.ac.uk/pipermail/eprints-tech/attachments/20120726/957c6dc7/attachment-0009.jpg