EPrints Technical Mailing List Archive

Message: #05399


< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First

[EP-tech] Re: Question about full text search (Documents in Advanced Search page)


Just for those who may be having a similar problem in the future, it turned out to be that I need to force the indexer to do a full reindex. It's unfortunate that I would have to do this, since it's running all the time, but that's what fixed it for me.

Thanks for the help,
Mike.


On 1/27/2016 1:23 PM, Michael Street wrote:
Hi Lizz,

Thanks, yes, I have found the tables on my own and can manually insert
terms and it works that way.  I just can't figure out where the
disconnect is between what the Indexer is seeing and what it is or
isn't, inserting in the db.

I will check the video for more hints though, thanks.

--Mike


On 1/27/2016 11:24 AM, Lizz Jennings wrote:
Hi Michael,

Have you looked at the database entries for the indexes?  Adam showed which tables to look at (at about 6 minutes in) in the troubleshooting search video:

http://wiki.eprints.org/w/Training_Video:Search_Troubleshooting

That might offer a hint?

Lizz

--

Lizz Jennings BA MSc ACLIP MCLIP (Revalidated 2015)

Technical Data Officer

The Library 4.10, University of Bath, Bath, BA2 7AY UK

Ext. 3570 (External 01225 383570)

E.Jennings@bath.ac.uk

________________________________________
From: eprints-tech-bounces@ecs.soton.ac.uk <eprints-tech-bounces@ecs.soton.ac.uk> on behalf of Michael Street <mstreet@yorku.ca>
Sent: 27 January 2016 15:46
To: eprints-tech@ecs.soton.ac.uk
Subject: [EP-tech] Re: Question about full text search (Documents in Advanced Search page)

Hi folks,

Is there any other way to actually find out more details about what the
Indexer is doing?  I've turned it on verbose logging and loglevel 6, but
I'd like to really know exactly what terms it's found, and what it's
inserting, if anything, into the database.

Thanks,
Mike.

On 1/27/2016 9:46 AM, Michael Street wrote:
Hi Alan,

Thanks but I have tried that.  I've increased the logging verbosity and
tried reindexing one of the offending deposits.  Nothing in the logs.
To be honest, I see nothing in the logs but that there's no tasks.

Occasionally I see something about documents being locked but the
numbers don't match up.  I'm not sure how the numbering system works
(for ex. 'document.5917 is locked').  I would assume though, that I
would see an error message when reindexing one of the offending
deposits.  I don't see anything when reindexing those, so I assume the
'locked' message has nothing to do with it.

I will try the Xapian plugin later....see if that makes any difference.

--Mike


On 1/25/2016 4:20 AM, Alan.Stiles wrote:
Have you tried to reindex one of the missing items to see if it made a difference?  Check the error_log whilst it reindexes in case eprints is having some other issue with opening the pdf (we sometimes have issues with e.g. apostrophes in the filenames).


-----Original Message-----
From: eprints-tech-bounces@ecs.soton.ac.uk [mailto:eprints-tech-bounces@ecs.soton.ac.uk] On Behalf Of Michael Street
Sent: 22 January 2016 21:01
To: eprints-tech@ecs.soton.ac.uk
Subject: [EP-tech] Re: Question about full text search (Documents in Advanced Search page)

Hi again,

Does anyone have any idea why these documents are not showing up in the search results?

Any suggestions would really be appreciated.  I'm at a loss as to why it's not returning results that clearly have the search term in the pdf (and the converted text document).

--Mike Street

On 1/15/2016 11:05 AM, Michael Street wrote:
Hi John,

Thanks very much for your response.  Please find my answers below:

1)  Indexer is running and confirmed to be working.  The documents
that don't show up are some of the oldest and are available through
other links.  Newly deposited items also show up in the Views.

2)  I have tried pdftotext on the system and had no issues with
converting it.  I also was able to find the search term within the
document easily.

3)  I run a cronjob that updates the DB and switches everything to be
visible, every 15 minutes.  My client does not want anything to be
hidden, especially previous versions of eprints, so this was the
easiest way to achieve that, for me.  Also, the eprints in question do
show up in the Views, which shows they're set to visible.

So if you have any other ideas, I'd really appreciate it.  I'm at a
loss here.

Thanks,
Mike.


On 1/14/2016 4:35 PM, John Salter wrote:
Hi,
I'd check that you indexer is running, and that the task queue is processed.

I'd also check that the PDFs aren't restricted in some way (maybe see what something like pdftotext returns when run against one of the not-returned PDFs.

Also, as was mentioned in a different thread recently, check what the 'metadata visibility' flag for the EPrint is.

If none of that gets you anywhere, let us know and we'll put our collective thinking caps on!

Cheers,
John

________________________________________
From: eprints-tech-bounces@ecs.soton.ac.uk
<eprints-tech-bounces@ecs.soton.ac.uk> on behalf of Michael Street
<mstreet@yorku.ca>
Sent: 14 January 2016 16:04
To: eprints-tech@ecs.soton.ac.uk
Subject: [EP-tech] Question about full text search (Documents in Advanced       Search page)

Hi,

I've got some pdfs in the repository that include the phrase 'bohm'
many times but the Advanced Search page is only returning 4 out of
probably
25+ eprints as hits on the phrase.  I'm using the Documents search
25+ box,
which I believe it the full-text search box.  Is there something I'm
missing?

Any help would be appreciated thanks, Mike.

*** Options:
http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
*** Archive: http://www.eprints.org/tech.php/
*** EPrints community wiki: http://wiki.eprints.org/
*** EPrints developers Forum: http://forum.eprints.org/

*** Options:
http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
*** Archive: http://www.eprints.org/tech.php/
*** EPrints community wiki: http://wiki.eprints.org/
*** EPrints developers Forum: http://forum.eprints.org/
*** Options:
http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
*** Archive: http://www.eprints.org/tech.php/
*** EPrints community wiki: http://wiki.eprints.org/
*** EPrints developers Forum: http://forum.eprints.org/
*** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
*** Archive: http://www.eprints.org/tech.php/
*** EPrints community wiki: http://wiki.eprints.org/
*** EPrints developers Forum: http://forum.eprints.org/
-- The Open University is incorporated by Royal Charter (RC 000391), an exempt charity in England & Wales and a charity registered in Scotland (SC 038302). The Open University is authorised and regulated by the Financial Conduct Authority.

*** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
*** Archive: http://www.eprints.org/tech.php/
*** EPrints community wiki: http://wiki.eprints.org/
*** EPrints developers Forum: http://forum.eprints.org/
*** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
*** Archive: http://www.eprints.org/tech.php/
*** EPrints community wiki: http://wiki.eprints.org/
*** EPrints developers Forum: http://forum.eprints.org/
*** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
*** Archive: http://www.eprints.org/tech.php/
*** EPrints community wiki: http://wiki.eprints.org/
*** EPrints developers Forum: http://forum.eprints.org/
*** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
*** Archive: http://www.eprints.org/tech.php/
*** EPrints community wiki: http://wiki.eprints.org/
*** EPrints developers Forum: http://forum.eprints.org/
*** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
*** Archive: http://www.eprints.org/tech.php/
*** EPrints community wiki: http://wiki.eprints.org/
*** EPrints developers Forum: http://forum.eprints.org/