[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[EP-tech] about the multi-lingual metadata field



CAUTION: This e-mail originated outside the University of Southampton.
Hello Sonu,

In general, I think that repository software, metadata standards, and search engines need to do a better job for making internationalized multilingual content accessible.  I've been thinking a lot about this as a part of an Ideas Challenge team for the upcoming Open Repositories conference.  I wasn't aware of the multilingual Bazaar package, so thanks for mentioning it.

There are different "levels" of internationalization/translation, and it isn't clear from your question which one(s) you need:

  *   Interface: Translations of the repository interface and metadata field labels
  *   Metadata: Translations of metadata field values, for example: including a translation of the title or abstract or keywords (these are the three main fields that have conventionally received this type of treatment) into another language other than what the full-text/content is; this is provided for accessibility
  *   Content: Translations of the full-text.

Here are some points to keep in mind as to "best practices" around this, as far as I was able to learn:

     *
Language Codes (ISO)

Use either 2-letter (ISO-639-1) or 3-letter (ISO-639-2) language codes.  See: https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.loc.gov%2Fstandards%2Fiso639-2%2Fphp%2Fcode_list.php&data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Cb4cc175b26004579321608d91c6a5fe1%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637572065464824480%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=sBW%2B2NDxgBhDiP8TtSdPzaz2I3Zdl9G6eBp83g4giYQ%3D&reserved=0

IANA recommends using the 2-letter codes whenever they are available, and 3 letter codes if necessary.

     *

SciELO, PubMed Central and many other publishers are using something called JATS (https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FJournal_Article_Tag_Suite&data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Cb4cc175b26004579321608d91c6a5fe1%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637572065464824480%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=M59ETOar43BEbpRBoPWBydT2qK3pjSmJTGAWv5ff34A%3D&reserved=0), a NISO standard for scholarly article encoding, you can see more detail about that here: https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fjats4r.org%2F&data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Cb4cc175b26004579321608d91c6a5fe1%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637572065464824480%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=qNbo0%2B0wfrALGiL5oZxMr3KqpNs21zdCGZ36T47ZuPg%3D&reserved=0

     *   xml:lang attribute

        *   When examining the DTDs of the JATS schema (https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FJATS4R%2Fjats-dtds%2Ftree%2Fa53dd76b4dd393028015de00e5760b39b36176e2%2Fschema&data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Cb4cc175b26004579321608d91c6a5fe1%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637572065464824480%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=UxRD5dztYfQNelkldfyBK60XRXw6oLsHPevX2C%2FaYQc%3D&reserved=0) , the xml:lang attribute can be applied to almost any element, see: https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fjats.nlm.nih.gov%2Farticleauthoring%2Ftag-library%2F1.2%2Fattribute%2Fxml-lang.html&data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Cb4cc175b26004579321608d91c6a5fe1%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637572065464824480%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=BDQPAeQ%2FNXBR29VI0ceAxwpcqdHQAXtid%2FwJcHTB9mU%3D&reserved=0

     *

I didn't find any recommendations for multilingual metadata ( translated title, abstract or keywords ) in OpenAIRE guidelines for literature repositories (https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fopenaire-guidelines-for-literature-repository-managers.readthedocs.io%2Fen%2Fv4.0.0%2Fapplication_profile.html&data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Cb4cc175b26004579321608d91c6a5fe1%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637572065464824480%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=%2FuSlWEMJyoXiPjcREQv7%2FZD4DPjOsyaQxPHkiRDGGvI%3D&reserved=0)

All that is there is a language at the "Content" level, but no explanation for how to include granularity for providing translated titles/abstracts/keywords. So only "content" level language info can be provided, no metadata level.

     *   Search engines like Google Scholar exhibit a preference for translations of full-text only (content-level), and have difficulties/bias (as a matter of policy and/or technology) indexing translated metadata (metadata level) and especially multilingual content that isn't clearly partitioned into "pages" that include only one language at a time.

I think there is a useful discussion to be had here about how we can improve our systems/infrastructure to support multilingual access; it is important, I believe we can and need to do better.

Tomasz



________________________________
From: eprints-tech-bounces at ecs.soton.ac.uk <eprints-tech-bounces at ecs.soton.ac.uk> on behalf of Sonu Yadav via Eprints-tech <eprints-tech at ecs.soton.ac.uk>
Sent: Thursday, May 20, 2021 4:24 AM
To: eprints-tech at ecs.soton.ac.uk <eprints-tech at ecs.soton.ac.uk>
Subject: [EP-tech] about the multi-lingual metadata field

CAUTION: This e-mail originated outside the University of Southampton.
Dear all,

I have the Document in Hindi, English, Kannada, Tamil, etc.
On the Summary_page, I need to show the metadata field name like title, abstract, contributors written in the Hindi language, and other vernacular languages. I import the multilingual Bazar package but it only converts the title and abstract name in Hindi. But I need to do with all metadata field names.

What is the best practice to do so? and how to do it.

Thanks, and Regards,
Sonu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ecs.soton.ac.uk/pipermail/eprints-tech/attachments/20210521/39dc7053/attachment-0001.html