[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[EP-tech] Re: Question about full text search (Documents in Advanced Search page)



Hi Alan,

Thanks but I have tried that.  I've increased the logging verbosity and 
tried reindexing one of the offending deposits.  Nothing in the logs.  
To be honest, I see nothing in the logs but that there's no tasks.

Occasionally I see something about documents being locked but the 
numbers don't match up.  I'm not sure how the numbering system works 
(for ex. 'document.5917 is locked').  I would assume though, that I 
would see an error message when reindexing one of the offending 
deposits.  I don't see anything when reindexing those, so I assume the 
'locked' message has nothing to do with it.

I will try the Xapian plugin later....see if that makes any difference.

--Mike


On 1/25/2016 4:20 AM, Alan.Stiles wrote:
> Have you tried to reindex one of the missing items to see if it made a difference?  Check the error_log whilst it reindexes in case eprints is having some other issue with opening the pdf (we sometimes have issues with e.g. apostrophes in the filenames).
>
>
> -----Original Message-----
> From: eprints-tech-bounces at ecs.soton.ac.uk [mailto:eprints-tech-bounces at ecs.soton.ac.uk] On Behalf Of Michael Street
> Sent: 22 January 2016 21:01
> To: eprints-tech at ecs.soton.ac.uk
> Subject: [EP-tech] Re: Question about full text search (Documents in Advanced Search page)
>
> Hi again,
>
> Does anyone have any idea why these documents are not showing up in the search results?
>
> Any suggestions would really be appreciated.  I'm at a loss as to why it's not returning results that clearly have the search term in the pdf (and the converted text document).
>
> --Mike Street
>
> On 1/15/2016 11:05 AM, Michael Street wrote:
>> Hi John,
>>
>> Thanks very much for your response.  Please find my answers below:
>>
>> 1)  Indexer is running and confirmed to be working.  The documents
>> that don't show up are some of the oldest and are available through
>> other links.  Newly deposited items also show up in the Views.
>>
>> 2)  I have tried pdftotext on the system and had no issues with
>> converting it.  I also was able to find the search term within the
>> document easily.
>>
>> 3)  I run a cronjob that updates the DB and switches everything to be
>> visible, every 15 minutes.  My client does not want anything to be
>> hidden, especially previous versions of eprints, so this was the
>> easiest way to achieve that, for me.  Also, the eprints in question do
>> show up in the Views, which shows they're set to visible.
>>
>> So if you have any other ideas, I'd really appreciate it.  I'm at a
>> loss here.
>>
>> Thanks,
>> Mike.
>>
>>
>> On 1/14/2016 4:35 PM, John Salter wrote:
>>> Hi,
>>> I'd check that you indexer is running, and that the task queue is processed.
>>>
>>> I'd also check that the PDFs aren't restricted in some way (maybe see what something like pdftotext returns when run against one of the not-returned PDFs.
>>>
>>> Also, as was mentioned in a different thread recently, check what the 'metadata visibility' flag for the EPrint is.
>>>
>>> If none of that gets you anywhere, let us know and we'll put our collective thinking caps on!
>>>
>>> Cheers,
>>> John
>>>
>>> ________________________________________
>>> From: eprints-tech-bounces at ecs.soton.ac.uk
>>> <eprints-tech-bounces at ecs.soton.ac.uk> on behalf of Michael Street
>>> <mstreet at yorku.ca>
>>> Sent: 14 January 2016 16:04
>>> To: eprints-tech at ecs.soton.ac.uk
>>> Subject: [EP-tech] Question about full text search (Documents in Advanced       Search page)
>>>
>>> Hi,
>>>
>>> I've got some pdfs in the repository that include the phrase 'bohm'
>>> many times but the Advanced Search page is only returning 4 out of
>>> probably
>>> 25+ eprints as hits on the phrase.  I'm using the Documents search
>>> 25+ box,
>>> which I believe it the full-text search box.  Is there something I'm
>>> missing?
>>>
>>> Any help would be appreciated thanks, Mike.
>>>
>>> *** Options:
>>> http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
>>> *** Archive: http://www.eprints.org/tech.php/
>>> *** EPrints community wiki: http://wiki.eprints.org/
>>> *** EPrints developers Forum: http://forum.eprints.org/
>>>
>>> *** Options:
>>> http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
>>> *** Archive: http://www.eprints.org/tech.php/
>>> *** EPrints community wiki: http://wiki.eprints.org/
>>> *** EPrints developers Forum: http://forum.eprints.org/
>> *** Options:
>> http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
>> *** Archive: http://www.eprints.org/tech.php/
>> *** EPrints community wiki: http://wiki.eprints.org/
>> *** EPrints developers Forum: http://forum.eprints.org/
> *** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
> *** Archive: http://www.eprints.org/tech.php/
> *** EPrints community wiki: http://wiki.eprints.org/
> *** EPrints developers Forum: http://forum.eprints.org/
> -- The Open University is incorporated by Royal Charter (RC 000391), an exempt charity in England & Wales and a charity registered in Scotland (SC 038302). The Open University is authorised and regulated by the Financial Conduct Authority.
>
> *** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
> *** Archive: http://www.eprints.org/tech.php/
> *** EPrints community wiki: http://wiki.eprints.org/
> *** EPrints developers Forum: http://forum.eprints.org/