How did we decide what to put into the latest version of EPrints?

How did we decide what to put into the latest version of EPrints?

The New Features in EPrints 3.1 wiki page lists a couple of dozen major and minor changes in EPrints 3.1, but they aren’t everything on the list of things we *wanted* to make by any stretch of the imagination (for example, WebDAV support was a favourite of mine that we missed off). So how did we settle on this particular set of features?

First off, EPrints has a kind of unwritten mission statement that says “repositories have a job to do”. It comes from the early days in 1999 when the formation of the OAI led Stevan Harnad to promise to create a piece of software that would embody OAI-PMH and allow everyone to participate in the world of shared internet resources. It was strengthened by the establishing of the Open Access movement a couple of years later. And it has been tempered by our experience of years of providing open source support for EPrints users and paid services for clients of EPrints Services. Whether the agenda is Open Access or preservation or scholarly collections, acquiring and managing digital material is a serious job that needs all the support possible for users, editors, managers and administrators. So anything that improves the lot of any of the stakeholders in a repository, anything that serves users or depositors or managers better has a high priority for EPrints development.

And it’s not all lovely community-minded altruism at work here. I manage three repositories, I sit on an repository steering committee and I’m responsible for the service provision for paying clients. I want this software to make my job easier. As one of my senior colleagues says “you’ve got to eat your own dogfood.”

The second influence is our users. Every time we run a training course and every time we have a public meeting we discuss the future direction of EPrints and the kind of facilities that it should adopt. Every time someone comes up with a question we can’t answer easily on the eptech mailing list, that’s a vote for a new feature – or an easier way of providing an existing feature. Every new EPrints Services client gives us input about what is important to them. And so do the users of our own repositories at Southampton – librarians and professors and research staff.

The third pointer to potential directions of development comes from The Repository Community’s discussions and papers and committees and projects, especially the projects that we’re involved with (like SWORD and SWAP and ORE etc).

But with all that input, the decision about what makes it in is still fairly chaotic – not random as such, but creative and agile and reacting to the current situation and pressures. Oh, perhaps a little bit random then!

Ever since we started to get feedback on version 3.0, we heard that quality assurance was becoming a big issue for people.
They liked the features we put in to help users get the metadata right in the first place, but they wanted tools to help them deal with the bulk of the material already in their repositories. Watching repository managers deal with the pressures of the RAE in the UK really underlined that message – if you’re delivering an institutional service you have to get it right. 3.1 is only a step in that direction, but we’re really proud to be able to make some contribution to improving the quality of repository holdings. There’s still some suggestions that we haven’t got round to implementing from the original EPrints 3.0 pre-rollout meeting in London in December 2006 (e.g. comparing locally-held metadata against authoritative external sources and importing the improvements) but we’ve made a start. And the important thing is that the vast majority of new features are just extra plugins that slot neatly into the existing infrastructure, so we’re no messing around with Core EPrints and making it difficult to upgrade.

Citation tracking features were added because of the very strong steer from our research managers at Southampton, who are in turn responding to national pressures for research assessment and research management.

Some basic support for Web2 facilities were added because of the large number of projects that were trying to put Web 2.0 features (comments, tags, votes, annotations) into EPrints. A meeting of these projects made it clear that there was sufficient (funded) community work going on to experiment with different approaches, and that the role of EPrints HQ would be to support those efforts, rather than to try to take the initiative in this area.

The new database layer was written by Tim Brody just because it was a good idea that improved the software architecture. In fact I didn’t know it was happening until it was finished! But directly from that sprang the metadata schema editor, using the inspired observation that the schema itself could be just another EPrints dataset.

The complex object support was added to try to better align with the semantic web – now everything (all items in all datasets) has a URI that doesn’t change, and everything can have a relationship with everything else. EPrints has always had this three-tier model of eprint – document – files, so we have always been very good at modeling what others call ‘complex objects’. These changes just strengthen that ability – although subsequent versions of the software will build more on this capability and bring it into the mainstream.

One particular source of input has been friend & critic Tony Hey. He used to be my boss and is now a VP at Microsoft with a portfolio that includes e-science and cyberinfrastructure. He repeatedly criticises EPrints and all repository platforms for being too darn difficult to install and use. In partial response, we recently produced a LiveCD version of EPrints to provide a 1-click installation option with no complicated configuration and dependencies. But what really stung me in Tony’s criticism was the amount of system programming skill needed to configure and maintain a repository – even the requirement to log in and be comfortable with a command line seems unreasonable for anyone but a programmer, and then you have to learn how to use an editor. I may live and breathe vi, but is it fair to require librarians to adopt this skill? So I was keen to try and move as much configuration and customisation away from the programmer’s interface to the manager’s (web) interface. The importance of this was underlined by listening to so the stories of many repository managers who are kept at arms length from their repositories by the contract with their technical support team.

So how can you affect what is added to the next release? You can talk to us! Email the ep-tech mailing list and make a suggestion.

1 Comment

  1. First of all, I have to tell that we are using EPrints, the digital archive contains 3K documents and some hundred thousands are foreseen to be added. We are using Eprints 3.05 (after the Fez did not work for us). So even though I would like to add my own bit of criticism, we are still grateful for this software – it does the job, with quirks, but it DOES the job.

    So my own comments from the point of view of advanced sysadmin and programmer. It has been pretty difficult to install EPrints – because I already had Apache webserver and Mysql and just wanted to put EPrints into a chroot jail *without* mysql – ok, I admit this is special and normal librarians would not be able to doi it anyway — no arrogance on my side now. But for anything than standard prepared install, EPrints is as difficult as any other program.

    EPrints was not more difficult than FEZ installation though, but definitely much worse than Fedora – well, Fedora is another cup of tea….

    And that was just an installation – it has been a real hell to make customizations. Because of the Perl nature, because EPrints coding style is "strange", because there is so much implicit un-told knowledge inside – the knowledge that only initiated people possess. For instance, there are calls inside the EPrints such as $session->{something} and one absolutely does not know if that is an attribute of the object, or its method, or method of some other object that was added to this object sometimes before by something else (and therefore it will not be available from other calls, or will be unusable).

    Modperl nature does not make it easy to debug or add new features – and when one tries to Debug::Print some object, it will just dump the whole EPrints >10MB because objects contains references to the whole EPrints (i think they are references, and not another copies of objects)

    I burned my fingers with UTF-8 too. Such a bizzare thing now, that it did not occur to me to suspect EPrints of not being able to handle it – I believed the documentation when it was sayin "UTF8 ready". The details were described here http://www.eprints.org/tech.php/9027.html

    My comments are perhaps too much technical but I want to say that EPrints is too much complicated now. The software that does something does not need to be so much complicated and still it can be very flexible. I think that you are taken too much by the nature of Perl and the fact that are several ways to do things. I believe that "Simple is better than complicated"

Comments are closed.