[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[EP-tech] Contents on EPrints repository is not featuring on Google Scholar



Hi,

I'd like to point the community to a SEO change applied to the EPrints 
core.

https://github.com/eprints/eprints/issues/450
---------------------------------------------

*The problem*

A number of repository administrators have noticed that their content is 
not featuring in Google Scholar search results.
We have been in discussion with Google Scholar in regard to how it 
discovers and indexes the contents of EPrints repositories.

While EPrints is by design crafted to present its content to Google in 
best way, Google Scholar is encountering issues around the initial 
discovery of the content.
Google?s crawler processes 100s of billions of links, and it needs a 
clearer way to identify that a link is to an EPrints repository rather 
than a normal website.
This would then allow Google Scholar to prioritise the crawling and 
indexing. Google Scholar already has EPrints specific rules in its 
crawler, and they are happy to update them.

*The solution*

Google Scholar and I have come up with a plan to increase the 
discoverability of EPrints content.

Currently, records on EPrints have URLs which look like
http://YOUR-REPO/EPRINTID/ eg http://irep.ntu.ac.uk/12853/
However this is not easily identified as EPrints content without 
visiting the actual page, and Google has a lot of pages to visit.

We intend to promote the existing EPrints ?URI? form of the links, which 
are easily identified as being EPrints content.
http://YOUR-REPO/id/eprint/EPRINTID/ eg 
http://irep.ntu.ac.uk/id/eprint/12853/
Currently the longer form of the URL redirects to the shorter version. 
And we would like to swap that around so that the shorter redirects the 
to the longer version.
That way no existing links will stop working, but gradually references 
to your repository, and more importantly Google's indexer will use the 
longer identifiable version.

Document URLs would need to be changed in a similar way, again any 
existing links would continue to work, but the promoted version of the 
links would change from
http://irep.ntu.ac.uk/12853/1/185527_3220%20Heasell%20prepublilsher.pdf
to
http://irep.ntu.ac.uk/id/eprint/12853/1/185527_3220%20Heasell%20prepublilsher.pdf


We have made the changes described above locally and they have proved 
successful.
Now we have now also applied the changes to the EPrints core.
These changes can be enabled by updating your 20_base_urls.pl to include
$c->{use_long_url_format} = 1;

If you apply these changes and would like Google Scholar to prioritise a 
reindex of your repository, get in touch with us and we?ll pass the 
message along to them.


Justin/Jiadi


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ecs.soton.ac.uk/pipermail/eprints-tech/attachments/20170719/913f9d29/attachment.html