[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[EP-tech] Re: Autoarchive local copy of open access document



Hi Andrew

The OpenAIRE API can be used to identify open versions of a publication, e.g. http://api.openaire.eu/search/publications?doi=10.1109/JSTARS.2010.2067050
Provided the repository was nice enough to expose not only the URL to splash page you even get the link to the fulltext.

Maybe something similar can be achieved by with the BASE Search interface: https://www.base-search.net/about/download/base_interface.pdf

If you want more or less structured data from repositories, these are the most comprehensive sources I know. However you would still miss all author websites and publications in social networks (academia.edu<http://academia.edu>, researchgate). If you want to reach out for those sources too, you probably have to use web crawler or build something on Google. Unless you don?t have a huge load of publications, I guess it is faster to do this manually, as you probably want to verify the crawled sources to be the right version anyway.

Based on my experience as repository manager, chances are very small that another repository had been more successful than you, to get a relevant accepted manuscript of ?your? authors.
Best regards

Christian

Am 08.04.2015 um 15:03 schrieb Ian Stuart <Ian.Stuart at ed.ac.uk<mailto:Ian.Stuart at ed.ac.uk>>:

On 08/04/15 13:12, Andrew Beeken wrote:
Interesting one from a meeting - if an output is linked to an open
access copy that's hosted elsewhere, is there a known plugin or
methodology to get EPrints to autoarchive a local copy of that
document?
Not a generic solution, no.

Different repositories (meaning store of things) have different APIs for
accessing content.

Even Screen-Scraping isn't a solution, as the GUI is highly variable.

In theory, one could write a script that grabbed a defined URL, and
pulled in all the .pdf / .doc / .xls / etc files and added then to an
eprint specified on the command line - but there's always the risk that
the web page has links to documents & things in the header/footer area;
or there's a side-bar with additional content, or....

--

Ian Stuart.
Developer: ORI, RJ-Broker, and OpenDepot.org<http://OpenDepot.org>
Bibliographics and Multimedia Service Delivery team,
EDINA,
The University of Edinburgh.

http://edina.ac.uk/

This email was sent via the University of Edinburgh.

The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

*** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
*** Archive: http://www.eprints.org/tech.php/
*** EPrints community wiki: http://wiki.eprints.org/
*** EPrints developers Forum: http://forum.eprints.org/

______________________________________
Christian Gutknecht
Koordination Informationssysteme Forschungsf?rderung (CoSi)
Schweizerischer Nationalfonds (SNF)
Wildhainweg 3, Postfach 8232, CH-3001 Bern
Telefon: +41 31 308 24 52
christian.gutknecht at snf.ch<mailto:christian.gutknecht at snf.ch> | www.snf.ch<http://www.snf.ch/>