EPrints Technical Mailing List Archive

Message: #09162


< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First

Re: [EP-tech] sitemap.xml


Hi Yuri,

How sitemaps can be generated was partially rewritten in EPrints 3.4 to make them more compatible and useful when adding to the Google Search Admin console.  I think before that the only sitemap available by default was /sitemap-sc.xml, which was designed for use with sitemaps.org.  I think the standalone script:

https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Feprints%2Feprints3.4%2Fblob%2Fmaster%2Fbin%2Fgenerate_sitemap&data=05%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C46e69ce7dee44db39f4708daf5524329%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C638092030901543449%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Jfgw9f8CkmkPoxKpjH2dpxS5KuPSzdVR8QrdmuYJag0%3D&reserved=0

Could just be copied to the same location in EPrints 3.3.x codebase and run as a nightly cron job.  I don't think any changes will need made to perl_lib/EPrints/Apache/Rewrite.pm to make this sitemap the one that is presented when requesting /sitemap.xml.  As the sitemap is written to sitemap.xml in your archive's cfg/static/ directory. So it would be treated like any other static page.

I think in 3.3 the assumption was you might want your own hand-crafted sitemap at /sitemap.xml, as maybe you have specific non-standard pages you want indexing.  So you were left to you own devices to either write this my hand or write a script that could regenerate it periodically.  The generate_sitemap script was written very much with Google (and other companies) search indexing in mind, which I think is probably what most repository owners care about. It only adds the abstract pages of live eprint records to the sitemap.  These are the most metadata rich pages specifically dedicated to individual publications.  So it was deemed best to add these to the sitemap rather than the documents.  However, by indexing services crawling abstract pages, the links to the documents should be discovered and subsequently indexed.

Regards

David Newman

On 13/01/2023 10:11 am, Yuri via Eprints-tech wrote:
CAUTION: This e-mail originated outside the University of Southampton.

Hi!

   in eprints 3.3 how can I generate a sitemap.xml file? Does it
automatically? What is perl_lib/EPrints/Apache/SiteMap.pm?


*** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
*** Archive: https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.eprints.org%2Ftech.php%2F&data=05%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C46e69ce7dee44db39f4708daf5524329%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C638092030901543449%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=qPe%2BofzQUYJJ5hmU6dH5MvYeXvVKeeLNTp14Puo18fw%3D&reserved=0
*** EPrints community wiki: https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwiki.eprints.org%2F&data=05%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C46e69ce7dee44db39f4708daf5524329%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C638092030901543449%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Phjoc%2B0QrOrmtC0rmE7iMnvSDBEm7YwyNNzNtM7iv9w%3D&reserved=0