[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[EP-tech] Re: Bulk export/import



Thanks for the advice here; I?m not looking at files for the moment as we have far too many on the server and my little local Virtual Box would crumple under their weight!

Okay, so I did an export from the admin on our live box to EP3 XML and tried to import this on my Virtual Box ? unfortunately this fails at the first hurdle with the following errors:

Invalid XML element: owner

Unhandled exception in Import::XML: Can't use string (" ") as an ARRAY ref whil?
e "strict refs" in use at /usr/share/eprints3/perl_lib/EPrints/MetaField.pm lin?
e 2106. at /usr/lib/perl5/XML/LibXML.pm line 881. XML::LibXML::parse_fh('XML::L?
ibXML=HASH(0x7fd6e81f35f0)', '*Fh::fh00001export_lirolem_XML.xml') called at /u?
sr/lib/perl5/XML/LibXML/SAX.pm line 99 eval {...} called at /usr/lib/perl5/XML/?
LibXML/SAX.pm line 98 XML::LibXML::SAX::_parse('XML::LibXML::SAX=HASH(0x7fd6e8c?
de0a8)') called at /usr/lib/perl5/XML/LibXML/SAX.pm line 54 XML::LibXML::SAX::_?
parse_bytestream('XML::LibXML::SAX=HASH(0x7fd6e8cde0a8)', '*Fh::fh00001export_l?
irolem_XML.xml') called at /usr/share/eprints3/perl_lib/XML/SAX/Base.pm line 26?
02 XML::SAX::Base::parse('XML::LibXML::SAX=HASH(0x7fd6e8cde0a8)', 'HASH(0x7fd6e?
9e4f930)') called at /usr/share/eprints3/perl_lib/XML/SAX/Base.pm line 2631 XML?
::SAX::Base::parse_file('XML::LibXML::SAX=HASH(0x7fd6e8cde0a8)', '*Fh::fh00001e?
xport_lirolem_XML.xml') called at /usr/share/eprints3/perl_lib/EPrints/XML/L ?



From: <eprints-tech-bounces at ecs.soton.ac.uk<mailto:eprints-tech-bounces at ecs.soton.ac.uk>> on behalf of Adam Field <af05v at ecs.soton.ac.uk<mailto:af05v at ecs.soton.ac.uk>>
Reply-To: "eprints-tech at ecs.soton.ac.uk<mailto:eprints-tech at ecs.soton.ac.uk>" <eprints-tech at ecs.soton.ac.uk<mailto:eprints-tech at ecs.soton.ac.uk>>
Date: Thursday, 2 July 2015 13:53
To: "eprints-tech at ecs.soton.ac.uk<mailto:eprints-tech at ecs.soton.ac.uk>" <eprints-tech at ecs.soton.ac.uk<mailto:eprints-tech at ecs.soton.ac.uk>>
Subject: [EP-tech] Re: Bulk export/import

If you export without the files, you'll get paths to files in the file object.  Through cunning use of symbolic linked directories or global find and replace in the XML file, you can put the files where the XML import will find them.  It's a bit hacky, but it works.


--
Adam Field
Business Relationship Manager and Community Lead
EPrints Services

On 2 Jul 2015, at 13:38, Andrew Beeken <anbeeken at lincoln.ac.uk<mailto:anbeeken at lincoln.ac.uk>> wrote:

That seems like a bit of a round the houses approach. I?ll dig through the
source and see what I can find.

On 02/07/2015 13:16, "eprints-tech-bounces at ecs.soton.ac.uk<mailto:eprints-tech-bounces at ecs.soton.ac.uk> on behalf of
George Mamalakis" <eprints-tech-bounces at ecs.soton.ac.uk<mailto:eprints-tech-bounces at ecs.soton.ac.uk> on behalf of
mamalos at eng.auth.gr<mailto:mamalos at eng.auth.gr>> wrote:

>From its documentation (perldoc ./bin/export) there doesn't seem to
support something like that. On the other hand, the documentation
mentions the option:

'
dataset:  The name of the dataset to export, such as "archive",
"subject" or "user".
'

You could maybe "exploit" this option by moving some eprints from one
dataset to another and by exporting/importing each dataset separately
(and then moving the appropriate eprints where they really belong).

Haven't checked the source code, though, so maybe there's another
solution hidden somewhere there...:)


On 02/07/2015 02:56 ??, Andrew Beeken wrote:
I wonder... Is it possible to export by type? I could perhaps export
each
type separately...

On 02/07/2015 12:18, "eprints-tech-bounces at ecs.soton.ac.uk<mailto:eprints-tech-bounces at ecs.soton.ac.uk> on behalf of
George Mamalakis" <eprints-tech-bounces at ecs.soton.ac.uk<mailto:eprints-tech-bounces at ecs.soton.ac.uk> on behalf of
mamalos at eng.auth.gr<mailto:mamalos at eng.auth.gr>> wrote:

Ian and Andrew,

I think that one can import/export specific entries -if I'm not
mistaken-, but I'm not exactly sure about the syntax. If it allows for
ranges, the 100.000 entries problem may be addressed by just splitting
the export/import process to more than one export/import operations. I
have used this syntax to select specific eprints, but my syntax was
something like the following:

./bin/export archid archive XML 114 115 116 117 > /tmp/export1

which would seem very peculiar if it would have to be used for
thousands
of records (I assume args would overflow!:)). Nonetheless, on the worst
case where ranges are not allowed, the former syntax could be used
successfully within a very carefully written script.


On 01/07/2015 06:37 ??, Ian Stuart wrote:
On 01/07/15 15:25, Andrew Beeken wrote:
Hello all!

I?m currently looking at migrating our repository to a fresh install,
mainly because we have a bit of customisation to our live repo and I
want to see how this process would affect the integrity of the data.
Is there an easy way of importing all records from one repository,
say to an XML file and then importing to the new one?
In general (and as George says) the XML-with-files export is the way
to
go.

I discovered it falls over with 100,000 records, so I just copied the
database & attached a new eprints to it :D


--
George Mamalakis

IT and Security Officer,
Electrical and Computer Engineer (Aristotle Univ. of Thessaloniki),
PhD (Aristotle Univ. of Thessaloniki),
MSc (Imperial College of London)

School of Electrical and Computer Engineering
Aristotle University of Thessaloniki

phone number : +30 (2310) 994379



*** Options:
http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
*** Archive: http://www.eprints.org/tech.php/
*** EPrints community wiki: http://wiki.eprints.org/
*** EPrints developers Forum: http://forum.eprints.org/

The University of Lincoln, located in the heart of the city of Lincoln,
has established an international reputation based on high student
satisfaction, excellent graduate employment and world-class research.

The information in this e-mail and any attachments may be confidential.
If you have received this email in error please notify the sender
immediately and remove it from your system. Do not disclose the contents
to another person or take copies.

Email is not secure and may contain viruses. The University of Lincoln
makes every effort to ensure email is sent without viruses, but cannot
guarantee this and recommends recipients take appropriate precautions.

The University may monitor email traffic data and content in accordance
with its policies and English law. Further information can be found at:
http://www.lincoln.ac.uk/legal.

*** Options:
http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
*** Archive: http://www.eprints.org/tech.php/
*** EPrints community wiki: http://wiki.eprints.org/
*** EPrints developers Forum: http://forum.eprints.org/


--
George Mamalakis

IT and Security Officer,
Electrical and Computer Engineer (Aristotle Univ. of Thessaloniki),
PhD (Aristotle Univ. of Thessaloniki),
MSc (Imperial College of London)

School of Electrical and Computer Engineering
Aristotle University of Thessaloniki

phone number : +30 (2310) 994379



*** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
*** Archive: http://www.eprints.org/tech.php/
*** EPrints community wiki: http://wiki.eprints.org/
*** EPrints developers Forum: http://forum.eprints.org/


*** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
*** Archive: http://www.eprints.org/tech.php/
*** EPrints community wiki: http://wiki.eprints.org/
*** EPrints developers Forum: http://forum.eprints.org/