Google's Custom Search Engine has realised immediate results in the area of repository search services. Although
not the first such service, and not even the first such service from Google, this one seems to have hit the mark where previous attempts to provide search that could be customised and directed to a specified range of repository sites
ultimately proved unsatisfactory.
OpenDOAR was the first respository directory to take advantage of the new service, followed by
ROAR Search Engine, searching the sites listed in the Registry of Open Access Repositories
SHERPA Search all UK Repositories
Search SHERPA Repositories, a subset of UK repositories that are part of the SHERPA network
AuseSearch, covering all the OA repositories in Australia and New Zealand.
The sudden proliferation of these services will clearly put more pressure on formal national repository search services such as
DAREnet and
ARROW, and on OAI search services such as
OAIster,
Bielefeld Academic Search Engine (BASE) and the new
ScientificCommons. All of these would have known from the outset they would be operating in the shadow of Google, but through OAI have some claim to index the 'deep' Web unseen by popular services.
The immediate debate sparked by these developments instead focussed on the role and effectiveness of institutional repositories for 'disclosing' content, and on OAI, the protocol intended to increase the visibility of targetted, mostly academic, repositories to services such as search.
On the first count, each of these latest Google custom services is based on an existing registry or directory of repositories, while the repositories' managed collections provide potentially easy search targets, so to question the value of repositories on this basis seems unjustified.
The case against OAI may at first appear easier to prosecute, but there are caveats. If Google could support a simple quality control "refereed material" tag then
according to Les Carr we could get by without OAI and without repositories: "Well, it doesn't" Carr continued "and so OAI still seems our best hope. However, even with five years of OAI our repositories are not doing a very good job of sharing metadata that helps a service to comprehend the status of the holdings that it harvests (is this a published, refereed journal article or equivalent? Is this a paper from an unrefereed workshop? is this a chemical data file?) Too much is still down to interpretation and subsequent data mining of the web pages."
While Carr highlighted the need for improved metadata standards for repositories, other correspondents placed responsibility for improving services with the repositories and with Google. Andy Powell
blogged the results of a rough-and-ready test of OpenDOAR search against native Google:
"Overall, what I conclude from this (once again) is that it is not the act of depositing a paper in a repository that is important for open access, but the act of surfacing the paper on the Web - the repository is just a means to en end in that respect. More fundamentally, I conclude that the way we configure, run and use repositories has to fit in with the way the Web works - not work against it or around it! First and foremost, our 'resource discovery' efforts should centre on exposing the full text of research papers in repositories to search engines like Google and on developing Web-friendly and consistent approaches to creating hypertext links between research papers."
Peter Suber
argued that Google will need to do more before OAI becomes redundant:
"Google (and Google Scholar and Google Custom Search) could neutralize some of the remaining advantages of OAI if it would (1) label peer-reviewed articles as peer-reviewed and (2) label OA articles as OA. It could make strides toward the first if it used, instead of discarding, the metadata it found in OA repositories. To make strides toward the second it would have to produce an OA-detecting algorithm that could distinguish an abstract from a full-text article. Authors could help by using machine-readable CC licenses, since the Google advanced search page already has a "usage rights" filter to limit results to CC-licensed content."
Update (2 November 2006). Les Carr has
responded to the challenge laid down by Andy Powell to compare the results of Google and OpenDOAR search using real search terms from repository logs: "I have had a closer examination of the queries that people have historically used on our repository (eprints.ecs.soton.ac.uk) and how they now perform. In particular, I looked at whether vanilla Google or the Google Custom Search Engine that implements the OpenDOAR search ranked our repository highest and/or put more ECS results on the first page of the query response (up to 100 results)."
"MY CONCLUSIONS: as a repository manager, OpenDOAR Search provides significantly more prominence for my holdings than Google does. However, as a researcher, I do not know whether the things that are in repositories are intrinsically more interesting that the things that Google rates highly!"