EPrints Technical Mailing List Archive

Message: #08479


< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First

Re: [EP-tech] Help indexing phrases


CAUTION: This e-mail originated outside the University of Southampton.
David thanks for your considered comments, as this is an prototype for guidance documents. 

Was thinking allow the same lines as making a field for each phrase they trying to get xapian to look at the whole field as a single term. ( will review metafield  in 3.4.2)


On 25 Jan 2021, at 09:21, David R Newman <drn@ecs.soton.ac.uk> wrote:



Hi Phil,

Unfortunately, I don't think this is possible.  I think you would need to create a new field that is an id multiple field and use this.  You could probably write a script to map from the uncontrolled keywords field into this new multiple id field.  However, even with this new field I am not sure how well Xapian would index these as individual multi-word terms.  Advanced search for this field should work as you require.  In 3.4.2 I introduced the Idci MetaField that is basically the same as the Id MetaField but that matches case-insensitively, this is useful for mathcing things like email addresses and usernames, where case does not usually make a functional difference.

I have been thinking how best to implement a keywords fields that is more effective across simple, advanced and faceted search, particularly for multi-word terms.  I have yet to conclude on a solution, as I need to better understand how Xapian indexing works to see if it can be setup to allow EPrints to effectively index multiple-word terms.

Regards

David Newman

On 25/01/2021 07:06, Phil Stacey via Eprints-tech wrote:
CAUTION: This e-mail originated outside the University of Southampton.
Using uncontrolled keywords field which has phrases separated by commas, like to index the whole phrase.

For example :-
evacuation lift, part b - fire safety, b5 access and facilities for the fire
service, fire risk assessment, residual risk, building safety, b4 external
fire spread, means of escape, principal works, health & safety strategy
https://eprints.buildvoc.co.uk/id/eprint/865/

Question how do I configure xapian or indexing.pl to index the whole phrase instead of the individual terms for example fire, safety, or building


*** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
*** Archive: http://www.eprints.org/tech.php/
*** EPrints community wiki: http://wiki.eprints.org/

Virus-free. www.avg.com