EPrints Technical Mailing List Archive

Message: #03815

< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First

[EP-tech] Re: ORCiD

Hi Adam, John, all the other ep geeks J


Adam, I am happy to help get this out there, as it appears to be something quite a lot of people are looking for… perl isn’t my strongest language, so


I’m not sure if I’m going to be clearing things up, or making them more confusing but..


·         The relationships between bazaar packages is an important consideration…

We implemented the authorID system with the 3.2.x version we were running (which had mePrints in place).

When we upgraded to 3.3.10 and re-implemented authorID changes, it was having some grief with mePrints functionality… (it’s on my todo list and it may have been me as the problem).


·         The authorID system uses the creator_id field as the linkage point into ePrints, so if it was a bazaar module, everyone could run as usual and only migrate if they wanted authorID functionality in place.


·         The process for migrating existing authors into the author model is a time consuming project, but one which was deemed necessary to help address governmental reporting requirements (and to ensure clean data and identifiable uniqueness for authors at a bibliographic level. e.g. two staff with the same name (John Smith?) can now be uniquely identified and their eprints distinguished).

This could have been done with just the creators_id field, storing email addresses etc, but we expanded on this and we can now identify institution/faculty/dept affiliations for each author in each paper to determine internal, as well as inter-institution  collaborations.


I know there may be differences in the application but the way we handled migrating was…

·         The primary conversion was scripted, taking all creators with unique creator_id’s (eg email address) and creating an author record for them. 

Variations of their creator_name was added into the record as ‘alternate names’ during this process.

·         At the same time, every author had a ‘default’ author_instance record created (researcher identified with a USQ email address, was also populated with the most recent fac/dept data available…)

The author_instance_id was stored in the creators_id field (creating the link from creator->author_instance->author)


There was LOTS of manual checking and possible clean-up to be done, verifying that things are working as expected…


Author Cleanup -

·         Most of the problems came from some authors being missed dues to having different email address (personal email/variations of work email etc).

·         The merge author functionality helps clear this up quickly, migrating mismatched authors into the correct one, an re-parenting the associated author_instances.

·         Authors without an email specified were not processed, and had to be created manually, after checking details of authorship/affiliation within the document etc.. (there weren’t many of these as most creator records contained emails from when they were first submitted)


Author Instance Cleanup – (This doesn’t affect the browse list for Unique authors etc, but can impact faculty/dept browse lists if using the author affiliation – we are still using the ePrint level dept/subdept fields).

·         We built a report that compares the creator_name against the associated author_name and flag any that are mismatched (eg. Both family and given names are different..)  This catches errors where an incorrect/different email was entered in creator_id.

·         For internal USQ users, we can ‘crowd source’ papers back to the authors, and ask for departmental affiliations for their work, or use date published  and HR data for automated processing.


Let me know if you want any more details and I will try to respond as promptly as time zones permit.






From: eprints-tech-bounces@ecs.soton.ac.uk [mailto:eprints-tech-bounces@ecs.soton.ac.uk] On Behalf Of John Salter
Sent: Tuesday, 20 January 2015 10:30 PM
To: 'eprints-tech@ecs.soton.ac.uk'
Subject: [EP-tech] Re: ORCiD


*apologies for butting in*

There’s a couple of aspects to this splitting/bundling that I think we (eprints geeks) need to consider:

1.       Relationship between packages (you can have a researchdata repository, but do you want simple authors, or the *really good* ones?)

2.       Migrating existing data into the new model


I think that we are part way down the route for (1), but I haven’t seen any discussion around (2).

e.g. How would I go from my current random list of names applied to EPrints, to using a new ‘real author’s package?

Is this something that  has been pondered somewhere?





From: eprints-tech-bounces@ecs.soton.ac.uk [mailto:eprints-tech-bounces@ecs.soton.ac.uk] On Behalf Of Adam Field
Sent: 20 January 2015 12:10
To: eprints-tech@ecs.soton.ac.uk
Subject: [EP-tech] Re: ORCiD


Hi Matt,


            Developing this into an installable package (ideally a bazaar package) would be the sensible way forwards.  We're in a process of identifying pieces of functionality like this in EPrints and breaking them out into installable packages to be able to create releases of EPrints for specific purposes (publications, data, open education, etc).


            Would you be interested in working with me to wrap this up into a bazaar package?  I have a large supply of extra voodoo that can be applied to this problem.






This email (including any attached files) is confidential and is for the intended recipient(s) only. If you received this email by mistake, please, as a courtesy, tell the sender, then delete this email.

The views and opinions are the originator's and do not necessarily reflect those of the University of Southern Queensland. Although all reasonable precautions were taken to ensure that this email contained no viruses at the time it was sent we accept no liability for any losses arising from its receipt.

The University of Southern Queensland is a registered provider of education with the Australian Government.
(CRICOS Institution Code QLD 00244B / NSW 02225M, TEQSA PRV12081 )