EPrints Technical Mailing List Archive

Message: #04734


< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First

[EP-tech] Re: Best way to import local (or remote) files through EPrints' XML file


On reflection, that may be more confusing than I first thought - I should have explained that I had to deal with manuscripts where there were several files per document - Main body, cover page, tables, figures, etc - and the  file metadata was already in a database. If you are just working from one file per eprint, then you don't need the loop for building up the series of files, obviously.  But I thought the XML templates might be useful. 

Andy
>>> "Andy Reid" <Andy.Reid@lshtm.ac.uk> 22 September 2015 16:18 >>>
Hi George,
Here's a chunk of PHP I put together recently to generate the documents/files section of an Eprints XML upload.  This puts all the files into one document tag, which may or may not be waht you want to do - I'm not sure exactly how standard the mainfile configuration is on our system, but it seems to only allow the one file per document to be downloaded.  So either you need to enclose each file in a separate document tag, or as I eventually did, zip all the files and push that up as one file.  This version is the 'one document, many files' approach, but I can send you the Zip version as well if you like.

<?php
function eprints_xml_OAfiles($row){    # $row is the metadata values for this record

global $link;
global $dataset;
$docroot=$_SERVER['DOCUMENT_ROOT'];
$filebase="$docroot/publications/administration/....  <where the files live > /";

 
$pub_id = $row['pub_id'];
$oaPub_ID = $row['oaPub_ID']; 

   $PM= $row['pubmedid']  ;   
 



 $files_query = "SELECT
       oaPub_ID,
        `oaManuscript_ID`,
        `oaPub_ID`,
        `content_oaManuscript`,   # Manuscript
        `file_oaManuscript`,
        `file_oaManuscript_mimetype`,
        `URL_oaManuscript`,
        `upload_oaManuscript`,
        `notes_oaManuscript`,
        `modified_oaManuscript`
                                   
                                    FROM
                                   oaManuscript2
                                WHERE
                                
                                    oaPub_ID = $oaPub_ID
        and upload_oaManuscript = 1
         ";
                              #  ORDER BY surname";
        $files_result = mysql_query ($files_query,$link)
        or die ("Query failed:$files_query");
   
 $filesXML=""; 
        while ($f = mysql_fetch_array($files_result)) { #build file metadata and base64 data
          $record=print_r($f,TRUE); echo "<!-- $record -->";
    $filename = $f['file_oaManuscript'];
   $mimetype = $f['file_oaManuscript_mimetype'];
   if(FALSE === ($STUFF=file_get_contents($filebase.$filename))){die("\n\nfailed to get file: $filebase$filename");}
   $base64=chunk_split(base64_encode($STUFF));
   $hash=md5($base64); 
   $filesize = strlen($STUFF);
   $file_modified= $f['modified_oaManuscript'];
     
  
  
$filesXML .= "
     
    <file>
      
    <datasetid>document</datasetid>
    
    <filename>$filename</filename>
    <mime_type>$mimetype</mime_type>
    <hash>$hash</hash>
    <hash_type>MD5</hash_type>
    <filesize>$filesize </filesize>
    <mtime>$file_modified</mtime>
         
    <data encoding='base64'>";

$filesXML .= $base64;

#.=chunk_split(base64_encode(file_get_contents($fileURLbase.$filename)));
$filesXML .= "</data>
  </file>";



   }# ends while ($row2 = mysql_fetch_array($coded_result)) 
   


  



return $cit = <<<EOC
<documents>
<document>
  <mime_type>$maintype</mime_type>
    <format>text</format>
    <language>en</language>
    <security>public</security>
    <license>cc_by</license>
    <main>$mainfile</main>
    <content>accepted</content>
      
    <files>
     
      $filesXML
     </files>
       </document>
</documents>


EOC;

}
?>    
>>> George Mamalakis <mamalos@eng.auth.gr> 22 September 2015 14:40 >>>
Hi everybody!

I'm very close to finishing my EPrints configuration + migration from
DSpace. The main thing that remains to be done, is the data migration part.

I've written a python script that generates an EPrints XML file based on
a DSpace csv file, that I'll upload to EPrints Wiki when it'll be done.

In order to complete it, I need to add the file, and I am not aware as
to what syntax I should use. I have a local folder whose subfolders
contain all DSPace files, where each subfolder name is the record id.
Therefore, my folder structure is somewhat like this:

/home/data/dspace/{record_id}

where {record_id} is the DSpace id of the specific record. What are the
minimum XML attributes that have to be added in my XML file in order for
EPrints to import the files? And how would an example XML entry look
like based on our example folder structure?

Thanks all in advance!

--
George Mamalakis

IT and Security Officer,
Electrical and Computer Engineer (Aristotle Univ. of Thessaloniki),
PhD (Aristotle Univ. of Thessaloniki),
MSc (Imperial College of London)

School of Electrical and Computer Engineering
Aristotle University of Thessaloniki

phone number : +30 (2310) 994379


*** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
*** Archive: http://www.eprints.org/tech.php/
*** EPrints community wiki: http://wiki.eprints.org/
*** EPrints developers Forum: http://forum.eprints.org/