[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[EP-tech] Re: Best way to import local (or remote) files through EPrints' XML file



On reflection, that may be more confusing than I first thought - I should have explained that I had to deal with manuscripts where there were several files per document - Main body, cover page, tables, figures, etc - and the  file metadata was already in a database. If you are just working from one file per eprint, then you don't need the loop for building up the series of files, obviously.  But I thought the XML templates might be useful.  

Andy
>>> "Andy Reid" <Andy.Reid at lshtm.ac.uk> 22 September 2015 16:18 >>>
Hi George,
Here's a chunk of PHP I put together recently to generate the documents/files section of an Eprints XML upload.  This puts all the files into one document tag, which may or may not be waht you want to do - I'm not sure exactly how standard the mainfile configuration is on our system, but it seems to only allow the one file per document to be downloaded.  So either you need to enclose each file in a separate document tag, or as I eventually did, zip all the files and push that up as one file.  This version is the 'one document, many files' approach, but I can send you the Zip version as well if you like.

<?php
function eprints_xml_OAfiles($row){    # $row is the metadata values for this record

global $link;
global $dataset;
$docroot=$_SERVER['DOCUMENT_ROOT'];
$filebase="$docroot/publications/administration/....  <where the files live > /";

 
$pub_id = $row['pub_id'];
$oaPub_ID = $row['oaPub_ID']; 

   $PM= $row['pubmedid']  ;   
 



 $files_query = "SELECT
	   oaPub_ID,
	    `oaManuscript_ID`,
	    `oaPub_ID`,
	    `content_oaManuscript`,   # Manuscript
	    `file_oaManuscript`,
	    `file_oaManuscript_mimetype`,
	    `URL_oaManuscript`,
	    `upload_oaManuscript`,
	    `notes_oaManuscript`,
	    `modified_oaManuscript`
								    
								    FROM
								   oaManuscript2
							    WHERE
								 
								    oaPub_ID = $oaPub_ID
	    and upload_oaManuscript = 1
		 ";
							  #  ORDER BY surname";
	    $files_result = mysql_query ($files_query,$link)
	    or die ("Query failed:$files_query");
    
 $filesXML=""; 
	    while ($f = mysql_fetch_array($files_result)) { #build file metadata and base64 data
		  $record=print_r($f,TRUE); echo "<!-- $record -->";
    $filename = $f['file_oaManuscript'];
   $mimetype = $f['file_oaManuscript_mimetype'];
   if(FALSE === ($STUFF=file_get_contents($filebase.$filename))){die("\n\nfailed to get file: $filebase$filename");}
   $base64=chunk_split(base64_encode($STUFF));
   $hash=md5($base64); 
   $filesize = strlen($STUFF);
   $file_modified= $f['modified_oaManuscript'];
	  
  
  
$filesXML .= "
	  
    <file>
	   
	<datasetid>document</datasetid>
	
	<filename>$filename</filename>
	<mime_type>$mimetype</mime_type>
	<hash>$hash</hash>
	<hash_type>MD5</hash_type>
	<filesize>$filesize </filesize>
	<mtime>$file_modified</mtime>
		  
	<data encoding='base64'>";

$filesXML .= $base64;

#.=chunk_split(base64_encode(file_get_contents($fileURLbase.$filename)));
$filesXML .= "</data>
  </file>";



   }# ends while ($row2 = mysql_fetch_array($coded_result)) 
    


  



return $cit = <<<EOC
<documents>
<document>
  <mime_type>$maintype</mime_type>
    <format>text</format>
    <language>en</language>
    <security>public</security>
    <license>cc_by</license>
    <main>$mainfile</main>
    <content>accepted</content>
	   
    <files>
	  
	  $filesXML
	 </files>
	   </document>
</documents>


EOC;

}
?>	
>>> George Mamalakis <mamalos at eng.auth.gr> 22 September 2015 14:40 >>>
Hi everybody!

I'm very close to finishing my EPrints configuration + migration from 
DSpace. The main thing that remains to be done, is the data migration part.

I've written a python script that generates an EPrints XML file based on 
a DSpace csv file, that I'll upload to EPrints Wiki when it'll be done.

In order to complete it, I need to add the file, and I am not aware as 
to what syntax I should use. I have a local folder whose subfolders 
contain all DSPace files, where each subfolder name is the record id. 
Therefore, my folder structure is somewhat like this:

/home/data/dspace/{record_id}

where {record_id} is the DSpace id of the specific record. What are the 
minimum XML attributes that have to be added in my XML file in order for 
EPrints to import the files? And how would an example XML entry look 
like based on our example folder structure?

Thanks all in advance!

-- 
George Mamalakis

IT and Security Officer,
Electrical and Computer Engineer (Aristotle Univ. of Thessaloniki),
PhD (Aristotle Univ. of Thessaloniki),
MSc (Imperial College of London)

School of Electrical and Computer Engineering
Aristotle University of Thessaloniki

phone number : +30 (2310) 994379


*** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
*** Archive: http://www.eprints.org/tech.php/
*** EPrints community wiki: http://wiki.eprints.org/
*** EPrints developers Forum: http://forum.eprints.org/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ecs.soton.ac.uk/pipermail/eprints-tech/attachments/20150922/524a259c/attachment-0001.html