[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[EP-tech] Contents on EPrints repository is not featuring on Google Scholar
- Subject: [EP-tech] Contents on EPrints repository is not featuring on Google Scholar
- From: jy6y13 at soton.ac.uk (Jiadi Yao)
- Date: Wed, 19 Jul 2017 11:41:32 +0100
Hi,
I'd like to point the community to a SEO change applied to the EPrints
core.
https://github.com/eprints/eprints/issues/450
---------------------------------------------
*The problem*
A number of repository administrators have noticed that their content is
not featuring in Google Scholar search results.
We have been in discussion with Google Scholar in regard to how it
discovers and indexes the contents of EPrints repositories.
While EPrints is by design crafted to present its content to Google in
best way, Google Scholar is encountering issues around the initial
discovery of the content.
Google?s crawler processes 100s of billions of links, and it needs a
clearer way to identify that a link is to an EPrints repository rather
than a normal website.
This would then allow Google Scholar to prioritise the crawling and
indexing. Google Scholar already has EPrints specific rules in its
crawler, and they are happy to update them.
*The solution*
Google Scholar and I have come up with a plan to increase the
discoverability of EPrints content.
Currently, records on EPrints have URLs which look like
http://YOUR-REPO/EPRINTID/ eg http://irep.ntu.ac.uk/12853/
However this is not easily identified as EPrints content without
visiting the actual page, and Google has a lot of pages to visit.
We intend to promote the existing EPrints ?URI? form of the links, which
are easily identified as being EPrints content.
http://YOUR-REPO/id/eprint/EPRINTID/ eg
http://irep.ntu.ac.uk/id/eprint/12853/
Currently the longer form of the URL redirects to the shorter version.
And we would like to swap that around so that the shorter redirects the
to the longer version.
That way no existing links will stop working, but gradually references
to your repository, and more importantly Google's indexer will use the
longer identifiable version.
Document URLs would need to be changed in a similar way, again any
existing links would continue to work, but the promoted version of the
links would change from
http://irep.ntu.ac.uk/12853/1/185527_3220%20Heasell%20prepublilsher.pdf
to
http://irep.ntu.ac.uk/id/eprint/12853/1/185527_3220%20Heasell%20prepublilsher.pdf
We have made the changes described above locally and they have proved
successful.
Now we have now also applied the changes to the EPrints core.
These changes can be enabled by updating your 20_base_urls.pl to include
$c->{use_long_url_format} = 1;
If you apply these changes and would like Google Scholar to prioritise a
reindex of your repository, get in touch with us and we?ll pass the
message along to them.
Justin/Jiadi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ecs.soton.ac.uk/pipermail/eprints-tech/attachments/20170719/913f9d29/attachment.html