EPrints Technical Mailing List Archive

Message: #06567

< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First

Re: [EP-tech] Dissecting the Documents folder

Hi Andrew,

> Do I ... put it in the new <eprints_root>/archives/<myarchive>/documents folder?
Because I have no idea what have to be done additionally in the following I describe my successful path of the past:

- Unpack your documents to /tmp/disc0/00/... e.g. (none of the thumbnails or indexcodes if crucial)

- Replace the leading part of <url> appropriately, i.e. insert the physical structure, by a sed call with following lines:

- Take care of the spaces in the file path: fortunately we had file names without any spaces on our linux system, thus I have NO experience :-)

- Remove all <rev_number> tags by `xmlstarlet ed -d "//_:rev_number" in.xml > /tmp/out.xml` to restart the change history

- Check your import file by `~/Eprints/bin/import yourRepo --parse-only --force archive XML yourInput`

- Start final run by `~/Eprints/bin/import yourRepo --migration --force archive XML yourInput`

- If anything fails, restart after `~/Eprints/bin/import yourRepo erase_eprints`

> Which part of the xml needs rewriting to tell the import 
> where to look for the file?
none due to your url modification/specification

The numbering follows the order of entries in your import file, thus any gap will be gone, but some confusion during comparing could occur ...