[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[EP-tech] OAI Harvester broken by new security

CAUTION: This e-mail originated outside the University of Southampton.
Hi James,
I'm guessing the 'security changes' include a WAF (web application firewall) or similar?

The OAI-PMH resumptionToken isn't that complicated - essentially parameters that can be passed to the script directly are URL-encoded.
I can see how this might trigger some WAF rules.

I think the main approaches are:-
- whitelist the OAI-PMH endpoint in the WAF
- whitelist harvested in the WAF (you might not know all harvesters that visit your repo though!)
- create a ruleset for the OAI-PMH vocabulary to be included in the WAF

The nature of an OAI-PMH harvest could look very much like a bad-actor probing your server.
The nature of the response payload could also mean the harvest creates peaks in server usage, which could make automated tooling connect the OAI-PMH requests to a DOS style attack.

Without knowing exactly what's at play it's difficult to make more refined suggestions.
Happy to have an off-list discussion about this, seeing as it's security-related.


From: eprints-tech-bounces at ecs.soton.ac.uk [mailto:eprints-tech-bounces at ecs.soton.ac.uk] On Behalf Of James Kerwin via Eprints-tech
Sent: 09 August 2022 09:57
To: eprints-tech at ecs.soton.ac.uk
Subject: [EP-tech] OAI Harvester broken by new security

CAUTION: This e-mail originated outside the University of Southampton.
Hello all,

Hope everyone is doing well.

This isn't a specific EPrints problem, but as you all use EPrints there may be some experience...

We've had some security changes at the uni recently. Some of these result in us clicking buttons in EPrints and then we get taken to our IT Services security page. So far we've handled this by accessing via the university network (e.g. VPN).

This issue has now hit our OAI harvester. Specifically under "ListRecords" when we click the "Resume" button (https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Flivrepository.liverpool.ac.uk%2Fcgi%2Foai2%3Fverb%3DListRecords%26metadataPrefix%3Doai_dc&amp;data=05%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C7a7ad0057f5a4cbb660c08da79e719a6%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637956330727438920%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=PkBV0jjh00et4S2%2F2SagLrEitrHhbdDOmCKRmGXb428%3D&amp;reserved=0<https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Flivrepository.liverpool.ac.uk%2Fcgi%2Foai2%3Fverb%3DListRecords%26metadataPrefix%3Doai_dc&amp;data=05%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C7a7ad0057f5a4cbb660c08da79e719a6%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637956330727438920%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=PkBV0jjh00et4S2%2F2SagLrEitrHhbdDOmCKRmGXb428%3D&amp;reserved=0>). Currently the organisations that usually harvest our content are unable to. I have spoken with our IT Services team to find a solution. Has anybody else experienced similar issues at their organisations and are there any steps you think I can take to resolve it?

It doesn't help that I don't know how resumption tokens work. I assume they are stored in a database somewhere? Or a file? The other incidences of this in the repository occur when making changes to file metadata, though not EPrint record metadata.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ecs.soton.ac.uk/pipermail/eprints-tech/attachments/20220809/a5a15455/attachment-0001.html