EPrints Technical Mailing List Archive

Message: #00701


< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First

[EP-tech] Re: GScholar Export Plugin Issue


Hi Nicholas,

   Many thanks for your help - definitely directed me to the key point in your module code. The critical piece affecting us (and likely others) is the clause seeking the end of the record. I noted that in your module, the conclusion of the search result is predicated on the phrase "Cached", whereas the original GScholar module was seeking "Web Search". I can only guess that the latter has fallen out of use in the search result. Presumably anything indexed by Scholar is also Cached, this seems stable at least until they decide to drop access to cached copies. For any others interested, a supplemental modification to GScholar.pm :

...
                        if( $links[$i]->text =~ /Web Search/ )
                        {
                                # no "cited by" link - give up
                                last;
                        }
                        # Added extra case - "web search" doesn't look to be as reliable a search result terminator
                        # any longer. Pulled from (http://files.eprints.org/641/) CH 2012-06-08.
                        if( $links[$i]->text =~ /Cached/ )
                        {
                                # no "cited by" link - give up
                                last;
                        }
...

Cheers,
Casey

-----Original Message-----
From: eprints-tech-bounces@ecs.soton.ac.uk [mailto:eprints-tech-bounces@ecs.soton.ac.uk] On Behalf Of Nicholas Sheppard
Sent: June-08-12 8:16 AM
To: eprints-tech@ecs.soton.ac.uk
Subject: [EP-tech] Re: GScholar Export Plugin Issue

Hi Casey.

We made several tweaks to EPrints' GScholar.pm for the version at http://files.eprints.org/641/.  Aside from the adjustment to the query string that you've mentioned, we also made some changes to the code that detects the end of Google Scholar's record.

I've moved to another position now and don't have my notes any more, but I recall that we sometimes got the behaviour that you describe because the original GScholar.pm uses the "All N versions" phrase to detect the end of the record. But not all records have this phrase, and so the plug-in would sometimes continue searching for the "Cited by..." phrase until it hit the record following the one it was actually looking for.

It's still not 100% reliable and of course the whole thing will fall over if Google changes the format of its search results.

--
Nicholas Sheppard (nicholas.sheppard@ieee.org)


Quoting rchilliard@mun.ca:

> Hi All,
>
>    I believe I might have uncovered an issue with the GScholar  
> export plugin packaged with EPrints. Under some situations, citation  
> link values are pulled from adjacent articles in the Google Scholar  
> search result in cases where the target article has no citations. As  
> an example within our repository (http://research.library.mun.ca/1/)  
> the citation link indicates 19 citations, however, clicking the link  
> reveals citations relative to a completely different article.  
> Searching scholar using the query string relative to the article, as  
> built by the ~eprints/perl_lib/EPrints/Plugin/Export/GScholar.pl  
> script segment:
>
> Snip----
> 87:          $quri->query_form(
> 88:                                          q => "$title author:$creator"
> 89:                                          );
> Snip----
>
> ("Demystifying Open Access author:Goddard") reveals that the  
> citation link is drawn from the article immediately following the  
> target within the search results. Modifying the query structure to  
> more rigidly qualify the title search seems to rectify the issue (at  
> least in this case) e.g.:
>
> Snip----
> 87:          $quri->query_form(
> 88:                                          q => "intitle:$title  
> author:$creator"
> 89:                                          );
> Snip----
>
> -- I believe this is the query form applied in the eprints citation  
> count module in: (http://files.eprints.org/641/), however, I'm not  
> sure whether or not there may be any knock-on effects of including  
> the change inside Eprints' GScholar.pm module -- any in the know  
> able to clarify / confirm?
>
> Cheers,
> Casey
>
> Casey Hilliard
> PC Consultant,
> Health Sciences Library / QE2 Systems,
> Memorial University
> Phone: 709-777-2387 (HSL)
> Phone: 709-864-6267 (QE2)
>
> This communication is intended as a private communication for the  
> sole use of the primary addressee. The information contained herein  
> is private and confidential. If you are not the intended receipient,  
> you are hereby notified that copying, forwarding or other  
> dissemination or distribution of this communication by any means is  
> prohibited. If you are not specifically authorized to receive this  
> communication and you believe that you have received it in error,  
> please notify the original sender immediately.
>
>
> This electronic communication is governed by the terms and conditions at
> http://www.mun.ca/cc/policies/electronic_communications_disclaimer_2012.php



*** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
*** Archive: http://www.eprints.org/tech.php/
*** EPrints community wiki: http://wiki.eprints.org/

This electronic communication is governed by the terms and conditions at
http://www.mun.ca/cc/policies/electronic_communications_disclaimer_2012.php