[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[EP-tech] Modification of language by script
CAUTION: This e-mail originated outside the University of Southampton.
Hi Andr?s,
Do some EPrints have multiple documents, and can those documents be in different languages?
It sounds like you want to process everything, rather than searching for specific documents/EPrints to update.
>From the details in these pages:
- https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwiki.eprints.org%2Fw%2FAPI%3AEPrints%2FDataSet&data=05%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C8c726bc01dfd4792af5708daf86e1f09%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C638095449090016678%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=1D2BBZ3zCO4hmmBkLfVPJUy3poZLDrF3fH7Iro0AXqg%3D&reserved=0
- https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwiki.eprints.org%2Fw%2FAPI%3AEPrints%2FList&data=05%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C8c726bc01dfd4792af5708daf86e1f09%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C638095449090016678%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=ykFkeoEzMenvTtG%2FRDu3%2FBM1BnHkIWVQQ9xWKjpg1og%3D&reserved=0
I would start with something like this:
https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgist.github.com%2Fjesusbagpuss%2Fa8cc8c5328aa6e33e068609bc6f3d6ca&data=05%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C8c726bc01dfd4792af5708daf86e1f09%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C638095449090016678%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=SaY6frGgLe2%2BwPNE7Gcgvic1uv9%2Fu2kP6bNW08bXxMM%3D&reserved=0
The bits you need to work out are in the 'process_eprint' function:
https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgist.github.com%2Fjesusbagpuss%2Fa8cc8c5328aa6e33e068609bc6f3d6ca%23file-fix_language-L78&data=05%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C8c726bc01dfd4792af5708daf86e1f09%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C638095449090016678%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=pzJqvGVNRJ0Hw4YYd0E6bLPvj1ysCSVIp10fsfjDoNE%3D&reserved=0 - how to calculate the language based on the EPrint details
https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgist.github.com%2Fjesusbagpuss%2Fa8cc8c5328aa6e33e068609bc6f3d6ca%23file-fix_language-L86&data=05%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C8c726bc01dfd4792af5708daf86e1f09%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C638095449090016678%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=vt2VxZuzewoxTj425RrUVRzHbHKJYYXer02Hq2ijxg8%3D&reserved=0 - how to calculate the language from the document
As-is, the script will not change anything. The 'commit' lines are commented out of safety.
It also references a field that might not exist (eprint.language).
You may want to check the existing setting for the language of the document before updating it.
If you have EPrints with multiple documents attached, you might want to do something like this:
$eprint->set_under_construction(1);
... update (commit) multiple doc changes
$eprint->set_under_construction(0);
$eprint->commit;
This means that the EPrint will have one new revision, rather than a revision for each document updated.
Let me know if that helps at all!
Cheers,
John
- you may want t
-----Original Message-----
From: Andr?s Holl [mailto:holl.andras at konyvtar.mta.hu]
Sent: 16 January 2023 12:05
To: John Salter <J.Salter at leeds.ac.uk>
Cc: eprints-tech <eprints-tech at ecs.soton.ac.uk>
Subject: Re: Modification of language by script
Dear John,
I am using EPrints 3.3.15. So far, the scripts for 3.2 did work for me.
Since we have installed EPrints (around 2008), the language field for the
documents have been hidden. For each uploaded documents EPrints used the
language settings of the browser as a guess for the language of a document,
and we did not care.
Now we have embarked upon a text mining project, and suddenly it become
important what the language is. I will process the content of the repository
(some 200k items), and find out what the language is, based first on the language
of the title, and then maybe the language of the text layer of the PDFs.
But when I know (or have a reasonable guess), I might try to set the language
of the EPrint document.
With kind regards,
Andras Holl
--
Holl Andr?s
informatikai f?igazgat?-helyettes / deputy director (IT)
MTA K?nyvt?r ?s Inform?ci?s K?zpont / MTA Library and Information Centre
----- Original Message -----
From: "John Salter" <J.Salter at leeds.ac.uk>
To: "eprints-tech" <eprints-tech at ecs.soton.ac.uk>, "holl andras" <holl.andras at konyvtar.mta.hu>
Sent: Monday, 16 January, 2023 12:37:30
Subject: RE: Modification of language by script
Hi Andr?s,
Which version of EPrints are you using?
The scripts you found were written against EPrints 3.2, so might not work if you are using EPrints 3.3 or 3.4.
How do you determine which documents need the language field to be updated?
Is there a field at the EPrint, or at the Document level that you need to search for, to work out which ones, or do you have a list of IDs, or something similar?
Cheers,
John
-----Original Message-----
From: eprints-tech-bounces at ecs.soton.ac.uk [mailto:eprints-tech-bounces at ecs.soton.ac.uk] On Behalf Of Andr?s Holl via Eprints-tech
Sent: 13 January 2023 12:51
To: eprints-tech at ecs.soton.ac.uk
Subject: [EP-tech] Modification of language by script
CAUTION: This e-mail originated outside the University of Southampton.
Dear All,
I would like to modify language settings of a document in a given EPrint by a script.
How should I do it, with the script search_and_modify.pl found at
https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.eprints.org%2Fservices%2Ftraining%2Fresources%2Fscripts%2Feprints3_2%2Fbin%2F&data=05%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C8c726bc01dfd4792af5708daf86e1f09%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C638095449090016678%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=V6Kws0Ljdwp1I34%2FycbEvvO%2F3PeNgeTjov%2BJHr8Os%2Fc%3D&reserved=0 ?
With kind regards,
Andr?s Holl
--
Holl Andr?s
informatikai f?igazgat?-helyettes / deputy director (IT)
MTA K?nyvt?r ?s Inform?ci?s K?zpont / MTA Library and Information Centre
*** Options: https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmailman.ecs.soton.ac.uk%2Fmailman%2Flistinfo%2Feprints-tech&data=05%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C8c726bc01dfd4792af5708daf86e1f09%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C638095449090016678%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=SRd5%2BRwvxGtWBablrcyw7cvBsdc8S%2BaF7VQmn1VeOSo%3D&reserved=0
*** Archive: https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.eprints.org%2Ftech.php%2F&data=05%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C8c726bc01dfd4792af5708daf86e1f09%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C638095449090016678%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=CTbEwbXlNAtE49JhlHpqSxNMktAgmNEti%2FhBSRcKLvo%3D&reserved=0
*** EPrints community wiki: https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwiki.eprints.org%2F&data=05%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C8c726bc01dfd4792af5708daf86e1f09%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C638095449090016678%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=cPUXlVOo1RQAgd6J7LfjiKLMOEowfeCjhZRDHeCgWFw%3D&reserved=0