[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[EP-tech] Help indexing phrases



Hi Phil,

Unfortunately, I don't think this is possible.? I think you would need 
to create a new field that is an id multiple field and use this.? You 
could probably write a script to map from the uncontrolled keywords 
field into this new multiple id field. However, even with this new field 
I am not sure how well Xapian would index these as individual multi-word 
terms.? Advanced search for this field should work as you require.? In 
3.4.2 I introduced the Idci MetaField that is basically the same as the 
Id MetaField but that matches case-insensitively, this is useful for 
mathcing things like email addresses and usernames, where case does not 
usually make a functional difference.

I have been thinking how best to implement a keywords fields that is 
more effective across simple, advanced and faceted search, particularly 
for multi-word terms.? I have yet to conclude on a solution, as I need 
to better understand how Xapian indexing works to see if it can be setup 
to allow EPrints to effectively index multiple-word terms.

Regards

David Newman

On 25/01/2021 07:06, Phil Stacey via Eprints-tech wrote:
> *CAUTION:* This e-mail originated outside the University of Southampton.
> Using uncontrolled keywords field which has phrases separated by 
> commas, like to index the whole phrase.
>
> For example :-
> evacuation lift, part b - fire safety, b5 access and facilities for 
> the fire
> service, fire risk assessment, residual risk, building safety, b4 external
> fire spread, means of escape, principal works, health & safety strategy
> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Feprints.buildvoc.co.uk%2Fid%2Feprint%2F865%2F&data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Ccafb007d98ca48bb1f6708d8c1128d2c%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637471632701606522%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=8ZjndinZ6sgTzvz4%2BonMhJhwWaw0WQ7GbNGg8lM5weo%3D&reserved=0 
> <https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Feprints.buildvoc.co.uk%2Fid%2Feprint%2F865%2F&amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Ccafb007d98ca48bb1f6708d8c1128d2c%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637471632701606522%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=8ZjndinZ6sgTzvz4%2BonMhJhwWaw0WQ7GbNGg8lM5weo%3D&amp;reserved=0>
>
> Question how do I configure xapian or indexing.pl to index the whole 
> phrase instead of the individual terms for example fire, safety, or 
> building
>
> Best Regards,
>
> Phil Stacey
>
> building regulations guidance for fire safety 
> <https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Feprints.buildvoc.co.uk%2F&amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Ccafb007d98ca48bb1f6708d8c1128d2c%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637471632701606522%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=IEN%2B2rdxepRJV456UkmHL9D5%2FyhqujFtwxYdotMFci4%3D&amp;reserved=0>
>
>
> *** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
> *** Archive: https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.eprints.org%2Ftech.php%2F&amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Ccafb007d98ca48bb1f6708d8c1128d2c%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637471632701606522%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=XdCB0Z5R1O1%2BUA870XnKJfGx6CBlABIuUXXSeukiMY0%3D&amp;reserved=0
> *** EPrints community wiki: https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwiki.eprints.org%2F&amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Ccafb007d98ca48bb1f6708d8c1128d2c%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637471632701606522%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=VoHjGQNJzNIQeEdJhnOu56MDFaeTEY3DPO%2BM81w3rIg%3D&amp;reserved=0


-- 
This email has been checked for viruses by AVG.
https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.avg.com%2F&amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Ccafb007d98ca48bb1f6708d8c1128d2c%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637471632701616478%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=oQaBm05BcFWNvMEMQCePdpYKJP5KzWDN2O3WUQ3oxnY%3D&amp;reserved=0
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ecs.soton.ac.uk/pipermail/eprints-tech/attachments/20210125/77df7510/attachment-0001.html