EPrints Technical Mailing List Archive

Message: #08582


< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First

Re: [EP-tech] OAI2 Harvesting Problem


CAUTION: This e-mail originated outside the University of Southampton.
Hi David,

Apologies for the delayed response and thank you for the advice. I THINK what I've done is introduced trouble by not including the subjects in the EPrints database when I uploaded the new Divisions structure. The reason the search page went wonky is because it displays the subjects in a list. My assumption is that when it couldn't find them in the database it threw an error. Or I'm totally wrong. I did leave the Subjects in the eprint_fields.pl file, ironically to avoid causing too much trouble.

I managed to alter the OAI config to disregard subjects and that brought it back to life. Then on the next harvest it failed due to the warnings pasted below. It did then work on a subsequent attempt, which is good news!

"Error saving an xml document: Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host.
System.IO.IOException: Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host. ---> System.Net.Sockets.SocketException: An existing connection was forcibly closed by the remote host
   at System.Net.Sockets.Socket.Receive(Byte[] buffer, Int32 offset, Int32 size, SocketFlags socketFlags)"

Putting the failed attempt down to some sort of random disconnect as it was harvesting for 45 minutes before it broke.

Anyway, once again thanks for your help.

James



On Thu, Apr 8, 2021 at 6:06 PM David R Newman <drn@ecs.soton.ac.uk> wrote:

Hi James,

Subjects (and to a lesser extent divisions) have always been an integral part of EPrints.  Generally removing them from workflows, citations and config files like cfg/cfg.d/eprints_render.pl, cfg/cfg.d/eprint_search_advanced.pl, cfg/cfg.d/views.pl, etc. is sufficient to hide them without breaking EPrints.  If you start removing them from cfg/cfg.d/eprint_fields.pl is when you are likely to hit problems like those you mention, as certain aspects of EPrints expect the subjects or divisions fields to at least be defined even if they are not used.

I think in this particular (/cgi/oai2) situation, you probably needed to make sure you disabled the subjects OAI set in cfg/cfg.d/oai.pl.  Unfortunately, there is a rather complex list of config changes you need to be sure to make if you want to undefine (i.e. comment out / remove) the subjects field in eprint_fields.pl and make sure this does not break anything else.  If I get the opportunity, I will see if there is a suitable place on the wiki to document what this rather complex list of config changes is.

Regards

David Newman

On 08/04/2021 17:05, James Kerwin via Eprints-tech wrote:
CAUTION: This e-mail originated outside the University of Southampton.
Hi All,

Update on this after some mooching around. Our "ListSets" option in OAI2 no longer works and sends me to an error page:


When I redid the Divisions/Uni Structure recently in EPrints the Subjects were removed as we haven't used them in years. This caused a couple of issues with the advanced search and some abstract pages. Looking at the ListSets page for another repository, the Subjects are part of this.

I'm going to attempt to correct it by either altering the ListSets (probably not) or by resurrecting the Subjects. I'll update with any success/failure.

Hopefully I can prevent other would-be unintentional wreckers following in my footsteps.

Thanks,
James

On Thu, Apr 8, 2021 at 3:58 PM James Kerwin <jkerwin2101@gmail.com> wrote:
Hi All,

Hope everyone is happy and healthy.

Our repository is harvested by a company named EBSCO. Recently they have started receiving the following warning and failing to harvest:

"Harvest has been aborted by an error "Could not harvest from https://livrepository.liverpool.ac.uk/cgi/oai2: The remote server returned an error: (500) Internal Server Error"

They use this base URL:

https://livrepository.liverpool.ac.uk/cgi/oai2

This whole area is slightly off my radar so I was hoping if there are any common things I could check? Obviously the repository is up and running. I've asked for the dates of the most recent successful harvest and the first failure as well as if it is still happening. I also need to speak with our Computing Services Department to check if the IPs can get through the firewall.

Is there anything else I can/should check based on all of your collective experience?

We did recently get a new security certificate, but I can't imagine that is a problem as we do this each year without any issues.

Thanks,
James

*** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
*** Archive: http://www.eprints.org/tech.php/
*** EPrints community wiki: http://wiki.eprints.org/

Virus-free. www.avg.com