[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[EP-tech] Plan S - Persistent Identifiers



Hi James,

Yes, if you have been on HTTPS for a while and your URIs are already 
showing as HTTPS this is not a problem you need to worry about.

I think there is an expectation that a URI be resolvable and this is 
very much the the case when they start http:// or https://, which would 
be described as part of URL subset of URIs.? However, URNs (Uniform 
Resource Names) are the other subset of URIs and are not expected to be 
resolvable, at least not without specialist software.? It might in part 
be my own opinion that a URL-type URI does not have to forever be 
resolvable, (which raises the question: Does it ever need to be 
resolvable?), as who can guarantee that a hostname will forever host a 
website that will return an appropriate representation for a particular 
URI. However, this URI must never be re-used and must perpetually remain 
valid as an identifier for which it was created.

I agree with you that non-technical people do not fully appreciate the 
complexities of a hostname change.? "You can get everything to redirect, 
can't you?"? This is true but does create you a problem, as EPrints by 
default will start referencing items by a new URI.? This will not make 
the old URI invalid and you would obviously ensure the old hostname 
redirects to the new one, so resolvability would not be an issue 
either.? The problem comes when someone (or more likely a computer) has 
the old and new URIs and asks themselves are these two identifiers for 
the same thing. A human may be able to make the correct assumptive leap 
but a (non-AI embued) computer would not be able to make any such leap. 
This is the reason I incorporated the uri_url configuration option in 
EPrints 3.4.1+.? However, this can still cause people to fret as they 
ask: "Why is it still using old hostname (or only HTTP) for the URI?? We 
need to update that."

A DOI service does offer the benefit of being able to update what they 
point at to have longer persistence than an EPrints URI, that as we have 
discussed, can be a the whim of your institution's comms team.? However, 
even DOIs are still potentially at the whims of such teams, as I have 
seen institutions register some representation of their name as part of 
the DOI, e.g. 10.12345/UniOfX.6789.? However, usually these are 
sufficiently tangential that they do not get picked up by "branding".

I don't think there is a one size fits all answer to this question.? In 
a simple world where a repository is created with HTTPS to start with 
and never changes its hostname, EPrints URIs are perfect for meeting the 
Plan S PID requirements.? If there are changes needed to move to HTTPS 
or a new hostname, then uri_url configuration option helps manage that 
situation.? However, in some ways, having a service completely removed 
from EPrints and not at the whims of the non-technical, allows you to 
rise above this and avoid the side-effects of ensuring persistency.

Regards

David Newman

On 10/05/2021 09:48, James Kerwin wrote:
> *CAUTION:* This e-mail originated outside the University of Southampton.
> Morning David,
>
> Thank you for the detailed reply. It's given us a lot to think over. 
> Hopefully you don't mind that I passed your email on to my manager to 
> read as he is quite concerned about the PID side of things. We've 
> discussed this topic over the past week or so.
>
> In some ways, this has simplified things a lot. We essentially have a 
> PID in the form of the URI. We've been https for longer than I've been 
> here (July 2018) so I think we're covered with respect to https/https. 
> We could make use of "uri_url" when we upgrade to 3.4, but that's a 
> whole other story that's recently been complicated.
>
> It's the part about a PID not needing to be resolvable that?is proving 
> tricky,?which is where we were at the start.. I think we've got it 
> into our minds that if we're using a URI as an identifier, it should 
> be resolvable. For example, as a user I would want/expect it to be, 
> since it looks like a link. I THINK my interpretation should be "a URI 
> as a PID satisfies the Plan S requirements whether or not it links to 
> anywhere". After that, entirely dependent upon us and what we do, we 
> can ensure it remains resolvable for as long as possible (e.g. 
> maintaining old URLs/redirects etc. in the event of a repository 
> hostname change). The bizarre thing is that we aren't considering a 
> hostname change and I would push against it if we were. It just 
> appears to have come up as this unnecessary impediment (although it is 
> useful to consider this sort of thing and I am the fool that 
> brought?it up originally in our team).
>
> Personally, I'd give everything that comes into the repository a DOI. 
> We're already set up on the repository to mint DOIs for our theses 
> when they're moved to the live archive. It would make my life a lot 
> easier if this happened to everything that comes in. That can then 
> handle things such as hostname changes because you can change where 
> the DOI points to by changing the repository URL on the providers 
> (DataCite) webpage.
>
> The tricky bit is convincing others that we essentially already meet 
> this particular requirement...
>
> Thanks again for your advice David, it's been incredibly helpful.
>
> James
>
> On Wed, Apr 28, 2021 at 10:50 AM David R Newman <drn at ecs.soton.ac.uk 
> <mailto:drn at ecs.soton.ac.uk>> wrote:
>
>     Hi James,
>
>     Fortunately (or unfortunately) I have had quite a few thoughts on
>     the matter.? I have done my best to keep them to the point.
>
>     First, I don't think it is possible to account for the same item
>     being in multiple repositories.? As an individual institutional
>     repository owner you have no control over other institutional
>     repositories who may have shared authors on publications and have
>     the right to make the same publication available on their
>     institutional repositories.? Having a background in the Semantic
>     Web, trying to determine if two things with different unique
>     identifiers are actually the same thing is a near impossible
>     problem to solve definitively.? The best you can do is ensure the
>     same unique identifier is not somehow used for two different
>     things and also avoid creating and using more unique identifiers
>     than are absolutely necessary.
>
>     EPrints has always had a unique identifier in the form of a URI
>     (e.g. https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Feprints.example.org%2Fid%2Feprint%2F123&amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C4ab65b7776734ac46e3808d9139bd68f%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637562382302165307%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=weyP8rhq5YNJOTPCcCyMC%2FSk3mZ5lw%2BaMFZNZh2jfU4%3D&amp;reserved=0
>     <https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Feprints.example.org%2Fid%2Feprint%2F123&amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C4ab65b7776734ac46e3808d9139bd68f%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637562382302165307%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=weyP8rhq5YNJOTPCcCyMC%2FSk3mZ5lw%2BaMFZNZh2jfU4%3D&amp;reserved=0>).?
>     I would suggest this is the most appropriate unique identifier to
>     use as every item in your repository will have one but not every
>     item will necessarily have a DOI or similar unique identifier.?
>     You could configure your repository to use a DOI minting service
>     (e.g. data repositories often use DataCite) but this rather breaks
>     the rule of creating more unique identifiers than are absolutely
>     necessary.
>
>     One potential problem I have noted with EPrints URIs is that these
>     were all originally http but if you modify you HTTPS configuration
>     to ensure HTTPS is used everywhere, then these URIs will likely
>     also be changed to https, making them non-persistent which is
>     another big no-no.? For this reason, early on in EPrints 3.4 I
>     introduced a configuration properly 'uri_url' to ensure that you
>     could modify a repository's HTTPS configuration but if you had
>     this configuration option set you could keep the URIs as http.? As
>     in the context of being a unique identifier, you need to consider
>     the URI as being a string of characters and if this string of
>     characters changes, then it is no longer the same unique
>     identifier, even though it is still describing the same thing.
>
>     I think you also identified another potential problem with the
>     structure of an EPrints URI, which is if there is a change to the
>     hostname of the repository itself. Again the uri_url option should
>     allow you to ensure URIs do not change.? Unfortunately, this may
>     lead to confusion for users who wonder why the hostname for these
>     URIs is different to the hostname of the repository.? Also,
>     depending what happens to the old hostname's DNS registration
>     these URIs may become unresolvable.? However, there is no
>     requirement for URIs, as any unique identifier, to be resolvable.
>
>     If an item has a DOI provided by a journal, an ISBN provided by a
>     book publisher, etc. then this would typically be more useful than
>     an institutional repository's URI, as this would be used in a
>     general context (i.e. you would expect a DOI or ISBN to appear in
>     the citation for such an item).? However, I think to provide the
>     best possible coverage there is need for both forms for unique
>     identifier: the one from the original publisher (if that is not
>     the institutional repository, which would likely be the case for
>     theses, etc.) and one from the institutional repository.? If you
>     provide export formats that can be ingested by third-party
>     applications that include both unique identifiers and therefore
>     build a link between the two, it is possible to build and network
>     of unique identifiers for a particular item.? Then when you get a
>     journal article that has authors from multiple institutions, it
>     will be possible to see that a publication from institution A is
>     the same publication as from institution B.
>
>     Regards
>
>     David Newman
>
>
>     On 28/04/2021 10:02, James Kerwin via Eprints-tech wrote:
>
>>     *CAUTION:* This e-mail originated outside the University of
>>     Southampton.
>>     Hi All,
>>
>>     For once I have not broken anything, just looking for opinions
>>     and advice.
>>
>>     As part of Plan S we need to have persistent identifiers for
>>     scholarly publications. I have read this EPrints wiki:
>>
>>     https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwiki.eprints.org%2Fw%2FPlan_S&amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C4ab65b7776734ac46e3808d9139bd68f%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637562382302175265%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=g7XcStgNN%2FMGZWo8FRtmi9C8NoFiefzaqna8ZgBiuy8%3D&amp;reserved=0
>>     <https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwiki.eprints.org%2Fw%2FPlan_S&amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C4ab65b7776734ac46e3808d9139bd68f%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637562382302175265%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=g7XcStgNN%2FMGZWo8FRtmi9C8NoFiefzaqna8ZgBiuy8%3D&amp;reserved=0>
>>
>>     At Liverpool we aren't 100% sure about this topic. DOI would be
>>     the obvious choice, but there are some on my team who reasonably
>>     point out that the same item could be in several repositories and
>>     end up having several separate DOIs associated with it. I'm not
>>     sure how much that matters.
>>
>>     Does anybody have any thoughts on this point? We spoke with my
>>     predecessor, Adam, who was really helpful. Unconvinced team
>>     members have suggested using handle.net
>>     <https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fhandle.net%2F&amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C4ab65b7776734ac46e3808d9139bd68f%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637562382302175265%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=rGMQzTn2NE2N%2B1Glu7DUR7DMFsHCWCchxENxpZnJx9c%3D&amp;reserved=0>
>>     which I think is overkill and doesn't necessarily meet the needs
>>     of Plan S in itself.
>>
>>     Also, the URL/EPrints ID for each item, is this not a suitable
>>     persistent identifier? The wiki linked above does mention this.
>>     There's always the possibility a repository URL could change in
>>     the future, but I would expect some sort of redirect to overcome
>>     this.
>>
>>     If there is a more suitable place for this type of discussion
>>     please send me there.
>>
>>     Thanks,
>>     James
>>
>>     *** Options:http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech  <http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech>
>>     *** Archive:https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.eprints.org%2Ftech.php%2F&amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C4ab65b7776734ac46e3808d9139bd68f%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637562382302175265%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=JAsR1UGoVqlIUqUIula5Torm5avdwuTFUXbzZB2JAWo%3D&amp;reserved=0  <https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.eprints.org%2Ftech.php%2F&amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C4ab65b7776734ac46e3808d9139bd68f%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637562382302175265%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=JAsR1UGoVqlIUqUIula5Torm5avdwuTFUXbzZB2JAWo%3D&amp;reserved=0>
>>     *** EPrints community wiki:https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwiki.eprints.org%2F&amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C4ab65b7776734ac46e3808d9139bd68f%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637562382302175265%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=9TLX7OvWXTAinpY6cdKXI1o9fPYjoHKd70y%2FUnmLBMY%3D&amp;reserved=0  <https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwiki.eprints.org%2F&amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C4ab65b7776734ac46e3808d9139bd68f%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637562382302175265%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=9TLX7OvWXTAinpY6cdKXI1o9fPYjoHKd70y%2FUnmLBMY%3D&amp;reserved=0>
>
>     <https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.avg.com%2Femail-signature%3Futm_medium%3Demail%26utm_source%3Dlink%26utm_campaign%3Dsig-email%26utm_content%3Demailclient&amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C4ab65b7776734ac46e3808d9139bd68f%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637562382302175265%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=fP7MguIp8Qn%2FpzQQs79Ph1WGfn5Z1D4NSXzmO3Ql%2BjQ%3D&amp;reserved=0>
>     	Virus-free. https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.avg.com%2F&amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C4ab65b7776734ac46e3808d9139bd68f%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637562382302175265%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=9RAfcf57OLr1nDoK8DfoEKuIn%2BfFWVMEcaMa%2F8WCLMg%3D&amp;reserved=0
>     <https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.avg.com%2Femail-signature%3Futm_medium%3Demail%26utm_source%3Dlink%26utm_campaign%3Dsig-email%26utm_content%3Demailclient&amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C4ab65b7776734ac46e3808d9139bd68f%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637562382302175265%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=fP7MguIp8Qn%2FpzQQs79Ph1WGfn5Z1D4NSXzmO3Ql%2BjQ%3D&amp;reserved=0>
>
>


-- 
This email has been checked for viruses by AVG.
https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.avg.com%2F&amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C4ab65b7776734ac46e3808d9139bd68f%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637562382302185221%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=%2BXbCaBiv%2F8CzyL007%2FcPFp6cF7hA5GmKEwwLAEkqGpU%3D&amp;reserved=0
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ecs.soton.ac.uk/pipermail/eprints-tech/attachments/20210510/414850ee/attachment-0001.html