EPrints 2.3 Documentation - The Archive Configuration Files |
This section describes all the configuration files in an single archive in the EPrints system.
Once you have created an EPrints archive the information you entered is placed in an XML file in /usr/local/eprint2/archives/ with the name archiveid.xml - this file is documented later in this section.
The bulk of the archive configuration is copied from /opt/eprints2/defaultcfg/ into the archives own configuration directory (usually /opt/eprints2/archives/archiveid/cfg/ This directory will usually contain the following files and directories:
This section contains some general information about the XML archive config files: template, phrases, ruler and citations. metadata-types.xml uses XML but these comments do not apply.
This files use HTML elements (and other elements too). XHTML is a fairly new version of HTML which is back compatable with HTML 4 but written using XML not SGML. This means that it is much stricter but less ambiguous and easier to parse and modify. Assuming you know HTML, the main differences are as follows:
<br>
or <img src="foo">
still must be closed eg. <img src="foo"></img>
- this can be abbreviated to the neater looking: <img src="foo" />
<hr noshade>
elements. In XHTML it is represented as <hr noshade="noshade" />
So in summary, the HTML:
<img SRC=someurl> <hr NOSHADE WIDTH=2> <P>Foo bar</P>
should become in XHTML:
<img src="someurl" /> <hr noshade="noshade" width="2" /> <p>Foo bar</p>
And that's more or less it. See http://www.w3c.org/ for a complete description.
phrases, template and citations have one instance per supported language. This allows the system to generate pages and emails in more than one language. Supporting a new language will require translating the all the english in the english config files currently shipped. If you do intend it do this (lots of work!) please get in touch with the eprints admin so that we can avoid duplicated effort.
The XML files all use a DTD which defines a few extra entities. Entities are items in XML (or HTML) which start with ``&'' and end with ``;'' like &. These additional entities come from the entities DTD file created by generate_entities. One DTD is created per language, although currently the only variation is the archive name.
é €
.
None of these entities are not available in the citations file or the ruler file.
These files contain a mixture of custom tags and XHTML. To keep these distinct the XML files contain a name space definition in the first element. The pratical upshot is that all EPrints own tags have the prefix ``ep:''. The namespace information is actually ignored by the current version of the eprints system.
example of mixed tags (and entities):
<ep:phrase ref="lib/session:contact"><p>Feel free to contact <a href="mailto:&adminemail;">&archivename; administration</a> with details.</p></ep:phrase>
eprints elements: phrase xhtml elements: p, a eprints entities: archiveemail, archivename
This XML file appears in the archives/
directory, usually /opt/eprints2/archives/
, it describes the most very basic details about the archive. It is generated (and modified) by configure_archive and will not normally need to be edited.
EPrints looks in this directory for XML files and attempts to load them all when starting the webserver.
This file should be chmod'd so that it can not be read by random users as it contains the database password.
The top level element is ``archive'' which has the attribute ``id'' which is the id of the archive. It should be the same as the filename. If this file is foo.xml then the id should be foo.
<archive>
contains a list of XML tags enclosing some text. eg.
<host>stoatprints.org</host>
The following tags are expected in no special order:
/
<archivename language="en">White Lemur</archivename> <archivename language="fr">La Archive d'Lemur Blanc</archivename>
(apologies to the french, human languages aren't my strong suit)
This module imports the other 5 perl modules. It allows lots of little tweaks to the system, which are all commented in the file.
It includes options to hide various features you may not want and to customise the browse, search and subscription functions.
Also you can customise what each type of user can and can't do, and how they authenticate their passwords.
This configuaration file contains perl methods which are called when a session starts and ends, to log things, to generate the entities for the entities file and security on non public files.
The browse views are generated by the script ``generate_views'' and what that script does is configured by the ``browse_views'' item in the config.
It is a reference to a perl array [], each item of which is a hash {}.
The hash has 3 required properties and a number of optional ones.
id=>"foo"
would find it's title in the phrase ``viewname_eprint_foo''
Normally the system puts a paragraph tag around each citation, but if you use a custom citation this will not happen.
index.html
in /view/
with a list of all the browse views available. Setting nolink to 1 will hide this item.
index.html
file in /view/foo/
listing all the values of the view and linking to their respective pages.
The most common view is to browse by subject:
{ id=>"subject", allow_null=>0, fields=>"subjects", order=>"title/authors", hideempty=>1 }
A more complex view generates a view on author & editor ID's which are not advertised but may be captured by some other software to build staff CV pages.
{ id=>"person", allow_null=>0, fields=>"authors.id/editors.id", nohtml=>1, nolink=>1, noindex=>1, include=>1, order=>"-year/title" }
For my example person id ``wh'' this will generate a webpage called /view/person/wh.include (and one for each other value of authors or editors ID's) which can be captured by an external automated system.
The user permission configuration allows you to set what types of user can and can't do. The user home page will only show a user options which they can do.
New types of user, and which data about themselves they can edit is set in metadata-fields.xml.
Permissions are set by ``type'' of user. By default there are 3 kinds of user: ``user'', ``editor'' and ``admin''.
Admin can, by default, do everything.
Metadata is data about data. The information which we store to describe each record (eprint) in the system. Users also have metadata.
This module is the configuration for the metadata. This is probably the most important part of the system.
See the chapter on metadata for all the configuration options.
This section of the file contains subroutines which are called to set default values for Users, Documents and EPrints.
These functions let you set automatic fields. This allows you to make fields which are updated automatically each time the item (User/EPrints/Document) is commited to the database.
This allows you to create ``compound'' fields. Such fields are created by processing the values of other fields rather than being edited directly.
For example, if you wanted to make an automatic int field which contains the number of authors, you could add the following to set_eprint_automatic_fields:
# no authors at all will be undef, not [] so check first if( $eprint->is_set( "authors" ) ) { my $auths = $eprint->get_value( "authors" ); $eprint->set_value( "authcount" , scalar @{$auths} ); } else { $eprint->set_value( "authcount" , 0 ); }
This module configures how the archive exports its data via the OAI protocol.
For more inforamtion on the how and why of OAI see http://www.openarchives.org/
OAI allows a harvestor to request the metadata from your archive and other archives to provide a federated search. The next time the harvestor harvests your archive it only has to ask for items which have changed or been added since last time it asked.
The current version of EPrints supports OAI v2.0. OAI version one is no longer supported.
The base URL for your OAI v2.0 interface will be http://archivepath/perl/oai2
If you want to use the OAI system then you need to fill in the blanks, such as policy and the OAI-id of the archive.
You may create OAI sets in a similar manner to ``browse views'' in ArchiveConfig.pm.
If you want to change the way that an EPrint is mapped into Dublin Core then edit the make_metadata_oai_dc - which returns a DOM XML object.
To add a new metadata type you need to add a new mapping function and add entries to the namespaces, schemas and functions items near the top of the file.
This module contains fuctions which turn data into XHTML for displaying on the web.
If you want to change the way a user info page, or an eprint ``abstract'' page is rendered then here's the place to do it.
There are also ``full'' versions of these functions which display all the internal variables and things. These are the views which the editors and admin see.
The XHTML is generated using DOM (Document Object Model), but eprints provides some functions for easily generating XHTML DOM. The only method of DOM you should need to use is appendChild - which adds an element to this element.
Note, all text strings should be in UTF-8.
Example:
my $page = $session->make_doc_fragment(); my $h1 = $session->make_element( "h1" ); $h1->appendChild( $session->make_text( "Title" ) ); $page->appendChild( $h1 ); $page->appendChild( $session->make_element( "img", src=>"/images/cheese.gif", width=>128, height=>53 ) );
$page now contains:
<h1>Title</h1><img src="/images/cheese.gif" width="128" height="53" />
Many of the EPrints modules are now properly(!)
documented. For an example try running:
% perldoc /opt/eprints2/perl_lib/EPrints/Archive.pm
The functions most useful to extacting and rendering information are documented here:
$session->make_text( $text )
$session->make_doc_fragment()
$session->make_element( $name, %opts )
To make <h1 class="foo">...</h1>
you would call:
$session->make_element( "h1", class=>"foo" );
$session->render_ruler();
$session->render_link( $uri, $target )
<a href="uri"></a>
Which you can appendChild stuff into. If $target is specified then a target attribute is included - to make it pop up a new window.
$item->render_value( $fieldname, $showall )
$fieldname is the name of the field you want to render. If $showall is 1 then ALL values are rendered in a multilang field.
$item->render_citation( $style )
If $style is set then it uses the citation with that id instead.
$item->render_citation_link( $style )
$item->render_description()
$session->html_phrase( $phraseid, %opts )
It looks first in the archive field from the current language.
Then in the archive phrase file for english.
Then is the system phrase file for the current language.
Then is the system phrase file for the english.
The %opts are a series of DOM elements to place in the ``pin'' items in the phrase file.
$item->get_value( $fieldname, $no_id )
$item->is_set( $fieldname )
$eprint->get_all_documents()
This module you probably won't need to change unless you want to modify how eprints does searches for words in strings.
When a record is added to the system eprints uses this module to turn a string into a list of values which are indexed. By default these are words with 3 letters or more except some predefined stop words. It also turns latin characters with acutes into the their plain ascii (no acute/grave) versions.
It then does the same with the search string and looks for these keys.
Example:
The rain in spain falls mainly on the plains.
Is turned (by default) into the keys:
rain spain fall mainly plain
Thus searching for ``rain'' or ``plain'' or ``plains'' or ``MaiNlY'' will all match this string.
You may wish to add your own ``stop words''. eg. If you are running an archive about badgers, a search for the word ``badger'' will return almost all the records.
At a more complex level you may wish to add handling for non-european character sets (I have no idea how well the default setting will work on these), or do ``stemming'' - removing ``ed'', ``ing'', ``ies'', ``s'' etc. from the end of words so that ``land'' will match ``land'', ``landed'', ``landing'' and ``lands''. (It current removes 's').
Another suggestion is using soundex or similar techniques to match words which sound similar.
Changing the indexing on a live system will require you to regenerate the indexes using the reindex script. (If you don't then some of the search results will be wrong).
This module handles validating data entered by users. Each subroutine is described in more detail in the module itself.
Each subroutine returns a list of DOM elements, each of which describing a single problem. Any problems will prevent the user from continuing with editing until they correct the problems.
As with the rendering functions, if you don't care about making this work in more than one language then you can just make the DOM items by calling $session->make_text( ``problem explanation'' )
The eprint & document validation routines have a flag $for_archive which, if true, indicates that the item is being checked before going into the actual archive. You can use this to force an editor to enter fields which the user may leave blank.
The ciations file describes how to render an item (eprint/user/whatever) into a short piece of XHTML. Each citation has a ``type''. There are 3 kinds of citation:
The citation file contains a list of citation elements:
<ep:citation type="...">
Each one may contain text and tags. The text may also include the names of fields in the record being rendered. These names should be between @ symbols. eg. @authors@ or @title@. These will be replaced with a rendered version of the value in that field. (if you need an actual @ symbol for some reason two @@ with nothing inside will be rendered as a single @).
Note. The @title@ style was introduced in EPrints 2.2. Before that this file used XML entities such as &title; but this caused problems and didn't solve any. Use of entities is still supported, but deprecated.
In addition you may use XHTML elements and the following elements in the eprints namespace. These elements are always removed but they control if their contents is kept or not. Conditional elements may be placed inside each other since v2.2.
<ep:linkhere>
For example:
@year@<ep:ifmatch name="year" value="-1949"> (approx)</ep:ifmatch>
This will render (approx) after years before 1950. Neat eh?
This file allows you to configure the types of eprint, user, document and document security level.
When you add a new type you should add it's name to the archive phrases file(s). The phraseid is ``dataset_typename_typename'' eg. ``document_typename_pdf'', and you should add a new citation to the citations file. Any fields which are not required but appear in the citation should probably be inside a <ep:ifset> so that you don't get see ``UNSPECIFIED'' if they are not, er, specified.
The main element is ``metadatatypes''. This contains a list of ``dataset'' elements each of which has a name attribute.
The ``type'' elements in user and eprint ``dataset''s should contain a list of ``field'' elements. This describes the fields which may be edited for this type and the order that they appear on the form.
You may include system fields in this list, but be careful if you do.
You may optionally add <page name=``pagename'' /> elements to the field list. These break the submission process into smaller stages. The pagename is used to identify the sub-page, for purposes of validation etc. Pages only have an effect on eprint types, not user, document etc.
See the section on paged metadata.XX
This is a handy place to define the security levels. The type with no name is special. It is the ``public'' security type. All other types will require a valid username and password. If that username is acceptable for a given document is decided by the can_user_view_document subroutine in ArchiveConfig.pm
By default eprints requires at least one of ps, pdf, ascii or html to be uploaded before an eprint is valid. You may change this list in ArchiveConfig.pm - any more complicated conditions will have to be checked in the eprint validation subroutine.
This file contains a list of XML ``phrasees''. Everything eprints ``says'' to users is stored in this file and its system-level counterpart. If you want the site to run in more than one language, you need one phrase file per language.
The phrase file is XML and contains a toplevel ``phrases'' element. This contains the list of phrases.
Each phrase has a ``ref'' attribute to identify it and contains text and optionally some XHTML tags. It may also contain eprints entities such as &archivename; and also some phrases should contain ``pin'' elements, described below.
The phrases in the archive phrase file are specific to that archive, the system phrase file contains non-archive specific phrases. The id's of most of the phrases in the archive phrases are generated from the id's of the fields, datasets, types etc.
The archive phrase file contains: names of dataset types, names of metadata fields, help on entering each Ametadata field, the names of options in ``set'' fields, the description of different search ordering options, names of browse views, phrases used in the render and validation routines, mail which eprints sends out and phrases which override those in the system file.
Some phrases need some ``pin'' elements to show eprints where to insert values. Usually pins don't contain any elements but occasionally they do when they represent what to place a link around.
If you don't like some of the phrases in the main system phrases file you can override them by creating a phrase with the same ``ref'' in the archive file.
Don't edit the system file, if you upgrade eprints to a newer version it will get over-written.
EPrints sends out emails when a user registers/changes their password, when a user changes their email, when a deposited item is rejected/deleted by an editor and when the system is low on resources. These mails can be customised in the phrase file.
Make sure you wrap your text in paragraph <p> tags. EPrints will automatically word wrap these in the email. <hr /> elements in a mail are turned into a line of dashes.
When eprints sends a mail it will send it as plain ASCII text, unless it contains latin-1 elements, in which case it will be latin-1 encoded. If it contains unicode characters not in the latin-1 charset then it will be utf-8 encoded.
This file configures the horizontal divider which eprints uses, which is inserted in place of &ruler;
If you have no great dislike of <lt>hr<gt> horizontal rulers then you can leave it alone.
You can't use entities like &frontpage; in ruler.
This directory contains the static pages for the site - the frontpage, the help pages, images, the stylesheet etc.
static/
contains one directory per language, eg. en
. Plus a general
directory which contains files which don't need translating like images and the stylesheet.
When you run the generate_static
command it copies the files for each language, and the gerneral dir, into the static site for that language.
See the generate_static
documentation for more details.
This file is not used by the core eprints system. It is used by import_subjects to set up the initial subjects. For more information see the instructions for import_subjects.
This file is the shell of every page in the system. It is more or less a normal XHTML page but you can use the eprints &foo; entities in it and it should contain ``pin'' elements like a phrase. The pins it should contain are:
<ep:pin ref="title" />
<ep:pin ref="head" />
<ep:pin ref="pagetop" />
<ep:pin ref="page" />
EPrints 2.3 Documentation - The Archive Configuration Files |