EPrints Technical Mailing List Archive

See the EPrints wiki for instructions on how to join this mailing list and related information.

Message: #06567

Re: [EP-tech] Dissecting the Documents folder

To: eprints-tech@ecs.soton.ac.uk
Subject: Re: [EP-tech] Dissecting the Documents folder
From: Thomas Lauke <th.lauke@arcor.de>
Date: Thu, 8 Jun 2017 15:24:44 +0200 (CEST)

Hi Andrew,

> Do I ... put it in the new <eprints_root>/archives/<myarchive>/documents folder?
Because I have no idea what have to be done additionally in the following I describe my successful path of the past:

- Unpack your documents to /tmp/disc0/00/... e.g. (none of the thumbnails or indexcodes if crucial)

- Replace the leading part of <url> appropriately, i.e. insert the physical structure, by a sed call with following lines:
%s/http:\/\/eprints.lincoln.ac.uk\/\([0-9]\)\([0-9][0-9]\)\([0-9][0-9]\)\/\([0-9][0-9]\)/\/tmp\/disc0\/00\/0\1\/\2\/\3\/\4/
%s/http:\/\/eprints.lincoln.ac.uk\/\([0-9]\)\([0-9][0-9]\)\([0-9][0-9]\)\/\([0-9]\)/\/tmp\/disc0\/00\/0\1\/\2\/\3\/0\4/
%s/http:\/\/eprints.lincoln.ac.uk\/\([0-9][0-9]\)\([0-9][0-9]\)\/\([0-9][0-9]\)/\/tmp\/disc0\/00\/00\/\1\/\2\/\3/
%s/http:\/\/eprints.lincoln.ac.uk\/\([0-9][0-9]\)\([0-9][0-9]\)\/\([0-9]\)/\/tmp\/disc0\/00\/00\/\1\/\2\/0\3/
%s/http:\/\/eprints.lincoln.ac.uk\/\([0-9]\)\([0-9][0-9]\)\/\([0-9][0-9]\)/\/tmp\/disc0\/00\/00\/0\1\/\2\/\3/
%s/http:\/\/eprints.lincoln.ac.uk\/\([0-9]\)\([0-9][0-9]\)\/\([0-9]\)/\/tmp\/disc0\/00\/00\/0\1\/\2\/0\3/
%s/http:\/\/eprints.lincoln.ac.uk\/\([0-9][0-9]\)\/\([0-9][0-9]\)/\/tmp\/disc0\/00\/00\/00\/\1\/\2/
%s/http:\/\/eprints.lincoln.ac.uk\/\([0-9][0-9]\)\/\([0-9]\)/\/tmp\/disc0\/00\/00\/00\/\1\/0\2/
%s/http:\/\/eprints.lincoln.ac.uk\/\([0-9]\)\/\([0-9][0-9]\)/\/tmp\/disc0\/00\/00\/00\/0\1\/\2/
%s/http:\/\/eprints.lincoln.ac.uk\/\([0-9]\)\/\([0-9]\)/\/tmp\/disc0\/00\/00\/00\/0\1\/0\2/

- Take care of the spaces in the file path: fortunately we had file names without any spaces on our linux system, thus I have NO experience :-)

- Remove all <rev_number> tags by `xmlstarlet ed -d "//_:rev_number" in.xml > /tmp/out.xml` to restart the change history

- Check your import file by `~/Eprints/bin/import yourRepo --parse-only --force archive XML yourInput`

- Start final run by `~/Eprints/bin/import yourRepo --migration --force archive XML yourInput`

- If anything fails, restart after `~/Eprints/bin/import yourRepo erase_eprints`

> Which part of the xml needs rewriting to tell the import 
> where to look for the file?
none due to your url modification/specification

The numbering follows the order of entries in your import file, thus any gap will be gone, but some confusion during comparing could occur ...

Hth
Thomas

Prev by Date: [EP-tech] Metadata_visibility
Next by Date: [EP-tech] Retire Item from Review
Previous by thread: Re: [EP-tech] Dissecting the Documents folder
Next by thread: [EP-tech] Unspecified fields
Index(es):
- Date
- Thread