Importing Data into an EPrints Archive
EPrints has a bulk data import facility based on XML. To import an XML file in this format, use the import_eprints script.
[eprints@hostname eprints2]$ bin/import_eprints siteid dataset >filename
siteid is the archive identifier, given when you create the EPrints archive. filename is the name of the XML file to import. dataset is one of the four datasets as follows
- archive contains live eprints in the archive (this is almost certainly the one you want to use);
- buffer contains eprints awaiting approval by an editor;
- deletion contains deleted eprints;
- inbox contains eprints still being worked on.
There are some further datasets to import other types of data into EPrints; see the Dataset entry in section 4.1 of the EPrints technical documentation. To import the subject list, the import_subjects tool should be used instead — see section 17.13 of the EPrints documentation for more details.
XML file format documentation
<record>
<field id="cjg" name="authors">
<part name="family">Gutteridge</part>
<part name="given">Christopher</part>
</field>
<field id="mv" name="authors">
<part name="honourific">Dr.</part>
<part name="given">Marvin</part>
<part name="family">Fenderson</part>
</field>
<field name="year">1993</field>
<field name="subjects">foo</field>
<field name="subjects">bar</field>
<field name="subjects">baz</field>
<field name="title">
<lang id="en">The Thing</lang>
<lang id="de">da Thung</lang>
<lang id="fr">l'Thingu</lang>
</field>
</record>
...(more records can go here)...
</eprintsdata>
Sample XML interchange format
The top level element is eprintsdata which contains zero or more record elements.
A record element represents a single eprints object and contains zero or more field elements.
A field element has the attribute name which is the name of a field in the dataset. The contents of the field element are the value of this field in this record.
- Some eprints fields may be multiple in which case multiple values can be expressed by repeating several field elements with the same name attribute in a single record.
-
A field may contain nothing or some text or part elements or name elements. A field element may
also have an id attribute
which is the unique id of this value - a user id number, or an ISBN
or some such.
- A part element represents part of a value in a name field. It must have the attribute name which must be set to one of lineage, honourific, family or given. It may contain text or nothing.
- A lang element represents a version of the value of the field in a certain language. It may contain text or nothing. It has the required attribute id that is the ISO language code.
- The above example of a file with a single record with a multiple name field (with ids) named authors, a multiple subjects field named subjects, a multilang text field named title and a year field named year.





