[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[EP-tech] Help indexing phrases
- Subject: [EP-tech] Help indexing phrases
- From: phil at buildvoc.co.uk (Phil Stacey)
- Date: Mon, 25 Jan 2021 19:46:22 +0000
- In-reply-to: <259bf031-d9fb-bed3-1785-6631cc58ffd8@ecs.soton.ac.uk>
- References: <259bf031-d9fb-bed3-1785-6631cc58ffd8@ecs.soton.ac.uk> <0CF38D26-AFE1-47A3-BD77-D3E148D43B48@buildvoc.co.uk>
CAUTION: This e-mail originated outside the University of Southampton.
David thanks for your considered comments, as this is an prototype for guidance documents.
Was thinking allow the same lines as making a field for each phrase they trying to get xapian to look at the whole field as a single term. ( will review metafield in 3.4.2)
Best Regards,
Phil Stacey 07792661738<tel:07792661738>
building regulations guidance for fire safety<https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Feprints.buildvoc.co.uk%2F&data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Ce751f066be7145cf1cb608d8c169e5d4%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637472007855061562%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=Z%2BOAbZOa9r8ePHUxyJbjwytKmcncs%2FALvkRMQU06Lw0%3D&reserved=0>
On 25 Jan 2021, at 09:21, David R Newman <drn at ecs.soton.ac.uk> wrote:
?
Hi Phil,
Unfortunately, I don't think this is possible. I think you would need to create a new field that is an id multiple field and use this. You could probably write a script to map from the uncontrolled keywords field into this new multiple id field. However, even with this new field I am not sure how well Xapian would index these as individual multi-word terms. Advanced search for this field should work as you require. In 3.4.2 I introduced the Idci MetaField that is basically the same as the Id MetaField but that matches case-insensitively, this is useful for mathcing things like email addresses and usernames, where case does not usually make a functional difference.
I have been thinking how best to implement a keywords fields that is more effective across simple, advanced and faceted search, particularly for multi-word terms. I have yet to conclude on a solution, as I need to better understand how Xapian indexing works to see if it can be setup to allow EPrints to effectively index multiple-word terms.
Regards
David Newman
On 25/01/2021 07:06, Phil Stacey via Eprints-tech wrote:
CAUTION: This e-mail originated outside the University of Southampton.
Using uncontrolled keywords field which has phrases separated by commas, like to index the whole phrase.
For example :-
evacuation lift, part b - fire safety, b5 access and facilities for the fire
service, fire risk assessment, residual risk, building safety, b4 external
fire spread, means of escape, principal works, health & safety strategy
https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Feprints.buildvoc.co.uk%2Fid%2Feprint%2F865%2F&data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Ce751f066be7145cf1cb608d8c169e5d4%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637472007855061562%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=RCBAc89suFHpx1dYC5PPNZ4fVYpgZekAvAMg2fCTfps%3D&reserved=0<https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Feprints.buildvoc.co.uk%2Fid%2Feprint%2F865%2F&data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Ce751f066be7145cf1cb608d8c169e5d4%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637472007855061562%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=RCBAc89suFHpx1dYC5PPNZ4fVYpgZekAvAMg2fCTfps%3D&reserved=0>
Question how do I configure xapian or indexing.pl to index the whole phrase instead of the individual terms for example fire, safety, or building
Best Regards,
Phil Stacey
building regulations guidance for fire safety<https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Feprints.buildvoc.co.uk%2F&data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Ce751f066be7145cf1cb608d8c169e5d4%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637472007855071557%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=h3TMyXkmHOfHhH6wQ5YAVsk5wEgZvZqVdWIbr%2BjE98Q%3D&reserved=0>
*** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
*** Archive: https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.eprints.org%2Ftech.php%2F&data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Ce751f066be7145cf1cb608d8c169e5d4%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637472007855071557%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=QUbSwHweIK%2B5avBCHWfXhD3xiod5TuHWgly5W5Mruck%3D&reserved=0
*** EPrints community wiki: https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwiki.eprints.org%2F&data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Ce751f066be7145cf1cb608d8c169e5d4%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637472007855071557%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=hzIly5ztJcEnrD9h3JLDqwmduzwj2qusWVFsfZm2q08%3D&reserved=0
[https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fipmcdn.avast.com%2Fimages%2Ficons%2Ficon-envelope-tick-green-avg-v1.png&data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Ce751f066be7145cf1cb608d8c169e5d4%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637472007855071557%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=nTIH5Ni0tmfJ5xcYYZOXerQ6GY0kkF%2F8c3NcbiwLslA%3D&reserved=0]<https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.avg.com%2Femail-signature%3Futm_medium%3Demail%26utm_source%3Dlink%26utm_campaign%3Dsig-email%26utm_content%3Demailclient&data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Ce751f066be7145cf1cb608d8c169e5d4%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637472007855071557%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=9b9FDMsK%2BGnBxh7h2F3qFgFa9zgx31S7Pmp2m65%2BfuM%3D&reserved=0> Virus-free. https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.avg.com%2F&data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Ce751f066be7145cf1cb608d8c169e5d4%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637472007855071557%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=r5L%2BuiUPYF%2FpxVALMf913EzpO4kHwaDwCepT3th87k4%3D&reserved=0<https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.avg.com%2Femail-signature%3Futm_medium%3Demail%26utm_source%3Dlink%26utm_campaign%3Dsig-email%26utm_content%3Demailclient&data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Ce751f066be7145cf1cb608d8c169e5d4%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637472007855071557%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=9b9FDMsK%2BGnBxh7h2F3qFgFa9zgx31S7Pmp2m65%2BfuM%3D&reserved=0>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ecs.soton.ac.uk/pipermail/eprints-tech/attachments/20210125/d836cc48/attachment-0001.html