EPrints Technical Mailing List Archive

Message: #04461


< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First

[EP-tech] Re: Bulk export/import


Hi Andrew,
I suspect somewhere you've got a field that is multiple, that isn't represented properly in the XML.
A multiple field should look a bit like this in the XML:
<fieldname>
  <item>value1</value>
  <item>value2</value>
</fieldname>

A multiple- compound field is a bit more involved. These may be of use:
http://wiki.eprints.org/w/XML_Export_Format
http://wiki.eprints.org/w/Import_From_URL

I'd start with by exporting a record from your archive in the EPrints XML format - and compare that with the file you're trying to import.

Cheers,
John

-----Original Message-----
From: eprints-tech-bounces@ecs.soton.ac.uk [mailto:eprints-tech-bounces@ecs.soton.ac.uk] On Behalf Of Andrew Beeken
Sent: 07 July 2015 10:47
To: eprints-tech@ecs.soton.ac.uk
Subject: [EP-tech] Re: Bulk export/import

Hello all,

Okay, so I’ve fixed the mystery of the missing field definition but I’m
now getting the following error:

Unhandled exception in Import::XML: Can't use string (" ") as an ARRAY ref
whil?
e "strict refs" in use at
/usr/share/eprints3/perl_lib/EPrints/MetaField.pm lin?
e 2106. at /usr/lib/perl5/XML/LibXML.pm line 881.
XML::LibXML::parse_fh('XML::L?
ibXML=HASH(0x7f496679d7d0)', '*Fh::fh00001export_lirolem_XML.xml') called
at /u?
sr/lib/perl5/XML/LibXML/SAX.pm line 99 eval {...} called at
/usr/lib/perl5/XML/?
LibXML/SAX.pm line 98
XML::LibXML::SAX::_parse('XML::LibXML::SAX=HASH(0x7f49667?
9e8a8)') called at /usr/lib/perl5/XML/LibXML/SAX.pm line 54
XML::LibXML::SAX::_?
parse_bytestream('XML::LibXML::SAX=HASH(0x7f496679e8a8)',
'*Fh::fh00001export_l?
irolem_XML.xml') called at /usr/share/eprints3/perl_lib/XML/SAX/Base.pm
line 26?
02 XML::SAX::Base::parse('XML::LibXML::SAX=HASH(0x7f496679e8a8)',
'HASH(0x7f496?
67b41c0)') called at /usr/share/eprints3/perl_lib/XML/SAX/Base.pm line
2631 XML?
::SAX::Base::parse_file('XML::LibXML::SAX=HASH(0x7f496679e8a8)',
'*Fh::fh00001e?
xport_lirolem_XML.xml') called at
/usr/share/eprints3/perl_lib/EPrints/XML/L ...



Any thoughts as to what could be throwing this?

Andrew



On 02/07/2015 15:41, "eprints-tech-bounces@ecs.soton.ac.uk on behalf of
Andrew Beeken" <eprints-tech-bounces@ecs.soton.ac.uk on behalf of
anbeeken@lincoln.ac.uk> wrote:

>Ah, so that’s not a standard field - I came to the Uni after a lot of this
>work was done so ?I’m still trying to figure out which fields are bespoke
>and which are not. In a way I’m trying to avoid bringing bespoke things
>across with me but I think it’s clear to me now that our data simply will
>not fit a vanilla ePrints. Now I need to start a softly softly approach...
>
>On 02/07/2015 15:20, "eprints-tech-bounces@ecs.soton.ac.uk on behalf of
>George Mamalakis" <eprints-tech-bounces@ecs.soton.ac.uk on behalf of
>mamalos@eng.auth.gr> wrote:
>
>>It seems that on your previous system you had a field called "owner"
>>that doesn't exist in your new EPrints installation.
>>
>>Try to see how you have defined this field and copy your configuration
>>to your new EPrints installation. It'll probably be defined in:
>>
>>./archives/archname/cfg/cfg.d/eprint_fields.pl
>>
>>or in some other -custom- configuration file in
>>./archives/archname/cfg/cfg.d/.
>>
>>If it's not present in eprint_fields.pl, grep in the configuration
>>folder.
>>
>>Don't forget to reload your repository of your new installation after
>>installing the field (you'll probably also need to add phrases for your
>>new field, but that's another discussion).
>>
>>On 02/07/2015 05:03 μμ, Andrew Beeken wrote:
>>> Thanks for the advice here; I’m not looking at files for the moment as
>>>we have far too many on the server and my little local Virtual Box would
>>>crumple under their weight!
>>>
>>> Okay, so I did an export from the admin on our live box to EP3 XML and
>>>tried to import this on my Virtual Box ­ unfortunately this fails at the
>>>first hurdle with the following errors:
>>>
>>> Invalid XML element: owner
>>>
>>> Unhandled exception in Import::XML: Can't use string (" ") as an ARRAY
>>>ref whil?
>>> e "strict refs" in use at
>>>/usr/share/eprints3/perl_lib/EPrints/MetaField.pm lin?
>>> e 2106. at /usr/lib/perl5/XML/LibXML.pm line 881.
>>>XML::LibXML::parse_fh('XML::L?
>>> ibXML=HASH(0x7fd6e81f35f0)', '*Fh::fh00001export_lirolem_XML.xml')
>>>called at /u?
>>> sr/lib/perl5/XML/LibXML/SAX.pm line 99 eval {...} called at
>>>/usr/lib/perl5/XML/?
>>> LibXML/SAX.pm line 98
>>>XML::LibXML::SAX::_parse('XML::LibXML::SAX=HASH(0x7fd6e8c?
>>> de0a8)') called at /usr/lib/perl5/XML/LibXML/SAX.pm line 54
>>>XML::LibXML::SAX::_?
>>> parse_bytestream('XML::LibXML::SAX=HASH(0x7fd6e8cde0a8)',
>>>'*Fh::fh00001export_l?
>>> irolem_XML.xml') called at /usr/share/eprints3/perl_lib/XML/SAX/Base.pm
>>>line 26?
>>> 02 XML::SAX::Base::parse('XML::LibXML::SAX=HASH(0x7fd6e8cde0a8)',
>>>'HASH(0x7fd6e?
>>> 9e4f930)') called at /usr/share/eprints3/perl_lib/XML/SAX/Base.pm line
>>>2631 XML?
>>> ::SAX::Base::parse_file('XML::LibXML::SAX=HASH(0x7fd6e8cde0a8)',
>>>'*Fh::fh00001e?
>>> xport_lirolem_XML.xml') called at
>>>/usr/share/eprints3/perl_lib/EPrints/XML/L …
>>>
>>>
>>>
>>> From: 
>>><eprints-tech-bounces@ecs.soton.ac.uk<mailto:eprints-tech-bounces@ecs.so
>>>t
>>>on.ac.uk>> on behalf of Adam Field
>>><af05v@ecs.soton.ac.uk<mailto:af05v@ecs.soton.ac.uk>>
>>> Reply-To: 
>>>"eprints-tech@ecs.soton.ac.uk<mailto:eprints-tech@ecs.soton.ac.uk>"
>>><eprints-tech@ecs.soton.ac.uk<mailto:eprints-tech@ecs.soton.ac.uk>>
>>> Date: Thursday, 2 July 2015 13:53
>>> To: "eprints-tech@ecs.soton.ac.uk<mailto:eprints-tech@ecs.soton.ac.uk>"
>>><eprints-tech@ecs.soton.ac.uk<mailto:eprints-tech@ecs.soton.ac.uk>>
>>> Subject: [EP-tech] Re: Bulk export/import
>>>
>>> If you export without the files, you'll get paths to files in the file
>>>object.  Through cunning use of symbolic linked directories or global
>>>find and replace in the XML file, you can put the files where the XML
>>>import will find them.  It's a bit hacky, but it works.
>>>
>>>
>>> --
>>> Adam Field
>>> Business Relationship Manager and Community Lead
>>> EPrints Services
>>>
>>> On 2 Jul 2015, at 13:38, Andrew Beeken
>>><anbeeken@lincoln.ac.uk<mailto:anbeeken@lincoln.ac.uk>> wrote:
>>>
>>> That seems like a bit of a round the houses approach. I’ll dig through
>>>the
>>> source and see what I can find.
>>>
>>> On 02/07/2015 13:16,
>>>"eprints-tech-bounces@ecs.soton.ac.uk<mailto:eprints-tech-bounces@ecs.so
>>>t
>>>on.ac.uk> on behalf of
>>> George Mamalakis"
>>><eprints-tech-bounces@ecs.soton.ac.uk<mailto:eprints-tech-bounces@ecs.so
>>>t
>>>on.ac.uk> on behalf of
>>> mamalos@eng.auth.gr<mailto:mamalos@eng.auth.gr>> wrote:
>>>
>>>  From its documentation (perldoc ./bin/export) there doesn't seem to
>>> support something like that. On the other hand, the documentation
>>> mentions the option:
>>>
>>> '
>>> dataset:  The name of the dataset to export, such as "archive",
>>> "subject" or "user".
>>> '
>>>
>>> You could maybe "exploit" this option by moving some eprints from one
>>> dataset to another and by exporting/importing each dataset separately
>>> (and then moving the appropriate eprints where they really belong).
>>>
>>> Haven't checked the source code, though, so maybe there's another
>>> solution hidden somewhere there...:)
>>>
>>>
>>> On 02/07/2015 02:56 μμ, Andrew Beeken wrote:
>>> I wonder... Is it possible to export by type? I could perhaps export
>>> each
>>> type separately...
>>>
>>> On 02/07/2015 12:18,
>>>"eprints-tech-bounces@ecs.soton.ac.uk<mailto:eprints-tech-bounces@ecs.so
>>>t
>>>on.ac.uk> on behalf of
>>> George Mamalakis"
>>><eprints-tech-bounces@ecs.soton.ac.uk<mailto:eprints-tech-bounces@ecs.so
>>>t
>>>on.ac.uk> on behalf of
>>> mamalos@eng.auth.gr<mailto:mamalos@eng.auth.gr>> wrote:
>>>
>>> Ian and Andrew,
>>>
>>> I think that one can import/export specific entries -if I'm not
>>> mistaken-, but I'm not exactly sure about the syntax. If it allows for
>>> ranges, the 100.000 entries problem may be addressed by just splitting
>>> the export/import process to more than one export/import operations. I
>>> have used this syntax to select specific eprints, but my syntax was
>>> something like the following:
>>>
>>> ./bin/export archid archive XML 114 115 116 117 > /tmp/export1
>>>
>>> which would seem very peculiar if it would have to be used for
>>> thousands
>>> of records (I assume args would overflow!:)). Nonetheless, on the worst
>>> case where ranges are not allowed, the former syntax could be used
>>> successfully within a very carefully written script.
>>>
>>>
>>> On 01/07/2015 06:37 μμ, Ian Stuart wrote:
>>> On 01/07/15 15:25, Andrew Beeken wrote:
>>> Hello all!
>>>
>>> I’m currently looking at migrating our repository to a fresh install,
>>> mainly because we have a bit of customisation to our live repo and I
>>> want to see how this process would affect the integrity of the data.
>>> Is there an easy way of importing all records from one repository,
>>> say to an XML file and then importing to the new one?
>>> In general (and as George says) the XML-with-files export is the way
>>> to
>>> go.
>>>
>>> I discovered it falls over with 100,000 records, so I just copied the
>>> database & attached a new eprints to it :D
>>>
>>>
>>> --
>>> George Mamalakis
>>>
>>> IT and Security Officer,
>>> Electrical and Computer Engineer (Aristotle Univ. of Thessaloniki),
>>> PhD (Aristotle Univ. of Thessaloniki),
>>> MSc (Imperial College of London)
>>>
>>> School of Electrical and Computer Engineering
>>> Aristotle University of Thessaloniki
>>>
>>> phone number : +30 (2310) 994379
>>>
>>>
>>>
>>> *** Options:
>>> http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
>>> *** Archive: http://www.eprints.org/tech.php/
>>> *** EPrints community wiki: http://wiki.eprints.org/
>>> *** EPrints developers Forum: http://forum.eprints.org/
>>>
>>> The University of Lincoln, located in the heart of the city of Lincoln,
>>> has established an international reputation based on high student
>>> satisfaction, excellent graduate employment and world-class research.
>>>
>>> The information in this e-mail and any attachments may be confidential.
>>> If you have received this email in error please notify the sender
>>> immediately and remove it from your system. Do not disclose the
>>>contents
>>> to another person or take copies.
>>>
>>> Email is not secure and may contain viruses. The University of Lincoln
>>> makes every effort to ensure email is sent without viruses, but cannot
>>> guarantee this and recommends recipients take appropriate precautions.
>>>
>>> The University may monitor email traffic data and content in accordance
>>> with its policies and English law. Further information can be found at:
>>> http://www.lincoln.ac.uk/legal.
>>>
>>> *** Options:
>>> http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
>>> *** Archive: http://www.eprints.org/tech.php/
>>> *** EPrints community wiki: http://wiki.eprints.org/
>>> *** EPrints developers Forum: http://forum.eprints.org/
>>>
>>>
>>> --
>>> George Mamalakis
>>>
>>> IT and Security Officer,
>>> Electrical and Computer Engineer (Aristotle Univ. of Thessaloniki),
>>> PhD (Aristotle Univ. of Thessaloniki),
>>> MSc (Imperial College of London)
>>>
>>> School of Electrical and Computer Engineering
>>> Aristotle University of Thessaloniki
>>>
>>> phone number : +30 (2310) 994379
>>>
>>>
>>>
>>> *** Options: 
>>>http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
>>> *** Archive: http://www.eprints.org/tech.php/
>>> *** EPrints community wiki: http://wiki.eprints.org/
>>> *** EPrints developers Forum: http://forum.eprints.org/
>>>
>>>
>>> *** Options: 
>>>http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
>>> *** Archive: http://www.eprints.org/tech.php/
>>> *** EPrints community wiki: http://wiki.eprints.org/
>>> *** EPrints developers Forum: http://forum.eprints.org/
>>>
>>>
>>> *** Options: 
>>>http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
>>> *** Archive: http://www.eprints.org/tech.php/
>>> *** EPrints community wiki: http://wiki.eprints.org/
>>> *** EPrints developers Forum: http://forum.eprints.org/
>>>
>>
>>
>>-- 
>>George Mamalakis
>>
>>IT and Security Officer,
>>Electrical and Computer Engineer (Aristotle Univ. of Thessaloniki),
>>PhD (Aristotle Univ. of Thessaloniki),
>>MSc (Imperial College of London)
>>
>>School of Electrical and Computer Engineering
>>Aristotle University of Thessaloniki
>>
>>phone number : +30 (2310) 994379
>>
>>
>>
>>*** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
>>*** Archive: http://www.eprints.org/tech.php/
>>*** EPrints community wiki: http://wiki.eprints.org/
>>*** EPrints developers Forum: http://forum.eprints.org/
>
>
>*** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
>*** Archive: http://www.eprints.org/tech.php/
>*** EPrints community wiki: http://wiki.eprints.org/
>*** EPrints developers Forum: http://forum.eprints.org/


*** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
*** Archive: http://www.eprints.org/tech.php/
*** EPrints community wiki: http://wiki.eprints.org/
*** EPrints developers Forum: http://forum.eprints.org/