[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[EP-tech] sitemap.xml



Hi Yuri,

How sitemaps can be generated was partially rewritten in EPrints 3.4 to 
make them more compatible and useful when adding to the Google Search 
Admin console.? I think before that the only sitemap available by 
default was /sitemap-sc.xml, which was designed for use with 
sitemaps.org.? I think the standalone script:

https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Feprints%2Feprints3.4%2Fblob%2Fmaster%2Fbin%2Fgenerate_sitemap&data=05%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C46e69ce7dee44db39f4708daf5524329%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C638092030901543449%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Jfgw9f8CkmkPoxKpjH2dpxS5KuPSzdVR8QrdmuYJag0%3D&reserved=0

Could just be copied to the same location in EPrints 3.3.x codebase and 
run as a nightly cron job.? I don't think any changes will need made to 
perl_lib/EPrints/Apache/Rewrite.pm to make this sitemap the one that is 
presented when requesting /sitemap.xml.? As the sitemap is written to 
sitemap.xml in your archive's cfg/static/ directory. So it would be 
treated like any other static page.

I think in 3.3 the assumption was you might want your own hand-crafted 
sitemap at /sitemap.xml, as maybe you have specific non-standard pages 
you want indexing.? So you were left to you own devices to either write 
this my hand or write a script that could regenerate it periodically.? 
The generate_sitemap script was written very much with Google (and other 
companies) search indexing in mind, which I think is probably what most 
repository owners care about. It only adds the abstract pages of live 
eprint records to the sitemap.? These are the most metadata rich pages 
specifically dedicated to individual publications.? So it was deemed 
best to add these to the sitemap rather than the documents.? However, by 
indexing services crawling abstract pages, the links to the documents 
should be discovered and subsequently indexed.

Regards

David Newman

On 13/01/2023 10:11 am, Yuri via Eprints-tech wrote:
> CAUTION: This e-mail originated outside the University of Southampton.
>
> Hi!
>
>    in eprints 3.3 how can I generate a sitemap.xml file? Does it
> automatically? What is perl_lib/EPrints/Apache/SiteMap.pm?
>
>
> *** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
> *** Archive: https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.eprints.org%2Ftech.php%2F&data=05%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C46e69ce7dee44db39f4708daf5524329%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C638092030901543449%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=qPe%2BofzQUYJJ5hmU6dH5MvYeXvVKeeLNTp14Puo18fw%3D&reserved=0
> *** EPrints community wiki: https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwiki.eprints.org%2F&data=05%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C46e69ce7dee44db39f4708daf5524329%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C638092030901543449%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Phjoc%2B0QrOrmtC0rmE7iMnvSDBEm7YwyNNzNtM7iv9w%3D&reserved=0