Mechanically derived documents (versions)

Mechanically derived documents (versions)

EPrints 3.1 has only just been released but work on 3.2 has been progressing for some months now.

One EPrints feature that we’re improving in 3.2 is document thumbnails. First a bit of background: when you view the abstract (jump-off) page in EPrints each document will have either an icon or thumbnail shown. The thumbnail images are generated when a document is uploaded.

Three thumbnails are generated as standard: “small”, “medium” and “preview”. The “small” thumbnail is used as a substitute for the format icon. The “medium” thumbnail isn’t used. The “preview” thumbnail is shown whenever the user hovers the mouse pointer over the icon/thumbnail.

Thumbnails are mechanically derived versions of the uploaded documents – they’re generated by a defined process with no user interaction. There is also a text version generated from each document that contains the terms used to index that document. In future we also want to provide video previews of uploaded videos (a youtube-style interface) and lightbox versions of powerpoints etc. Another requirement is the ability to deliver cover-paged versions of documents.

To support a diverse set of derived documents we have implemented relationships between documents. In 3.2. when a thumbnail is generated it is actually a new document with “isVersionOf”, “isVolatileVersionOf” and “isThumbnailVersionOf” relations to the existing document. So, if you want the “small” thumbnail of a document, you query that document for it’s “hasSmallThumbnailVersion” relation. If a document is changed all of it’s “hasVolatileVersion” relations are removed and regenerated. (Before the expert reader gets too far ahead … relations aren’t implemented using a triple-store, they’re just the metadata fields that first appeared in 3.1)

Now for every document the user uploads a multitude of derived documents will be created. To avoid overloading the user with these new documents, “volatile” documents (“isVolatileVersionOf”) are hidden.

If the user uses the “conversion” tool during the upload stage converted documents become a “isVersionOf” the existing document. This will be used to improve the jump-off page by bundling different formats of the same document together.

All of the mechanically derived documents in EPrints are generated by “Convert” plugins, enabling extensibility of the whole system.