[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[EP-tech] EPrint 3.3.11 Failed to OAI import to WorldCat Digital Collection Gateway due to setSpec being larger then 255 characters



I attempting to upload our Metadata from our EPrints version 3.3.11 repository to the WorldCat Digital Collection Gateway using  OAI-PMH.

I received the following error message from the Digital Collection Gateway site, from page http://www.worldcat.org/DigitalCollectionGateway/collection_list.jsp

## Collection contained data too large for Digitial Collection Gateway.
## Collection constraints:
##
##     SetSpec cannot be larger than 255 characters
##    Name cannot be larger than 1000 characters

Examining the results of the OAI-PMH ListSets verb from my repository for large SetSpec values, I see results with setSpec length larger than 255:

$ curl http://repository.cshl.edu/cgi/oai2?verb=ListSets > ListSets
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  311k    0  311k    0     0  93751      0 --:--:--  0:00:03 --:--:--  225k
$ grep -A 1 -B 0 -E '<setSpec>.{254,}</setSpec>' ListSets
      <setSpec>7375626A656374733D496E76657374696761746976655F546563686E69717565735F616E645F45717569706D656E74:4D6963726F73636F7069635F546563686E69717565735F6F725F65717569706D656E74:6C6173657273:7175616E74697461746976655F6C617365725F7363616E6E696E675F70686F746F7374696D756C6174696F6E</setSpec>
      <setName>Subject = Investigative techniques and equipment: optical devices: lasers: quantitative laser scanning photostimulation</setName>
-- <setSpec>7375626A656374733D42696F696E666F726D6174696373:47656E6F6D696373:47656E6574696373:444E415F7374727563747572655F616E645F6D6F64696669636174696F6E:67656E65735F7374727563747572655F66756E6374696F6E:67656E65735F74797065:74726974686F7261785F67726F75705F67656E6573</setSpec>
      <setName>Subject = bioinformatics: genomics and proteomics: genetics &amp; nucleic acid processing: DNA, RNA structure, function, modification: genes, structure and function: genes: types: trithorax group genes</setName>
--   <setSpec>7375626A656374733D42696F696E666F726D6174696373:47656E6F6D696373:47656E6574696373:444E415F7374727563747572655F616E645F6D6F64696669636174696F6E:67656E65735F7374727563747572655F66756E6374696F6E:67656E655F726567756C6174696F6E:68657465726F64696D6572697A6174696F6E</setSpec>
      <setName>Subject = bioinformatics: genomics and proteomics: genetics &amp; nucleic acid processing: DNA, RNA structure, function, modification: genes, structure and function: gene regulation: heterodimerization</setName>
--   <setSpec>7375626A656374733D42696F696E666F726D6174696373:47656E6F6D696373:47656E6574696373:444E415F7374727563747572655F616E645F6D6F64696669636174696F6E:67656E65735F7374727563747572655F66756E6374696F6E:67656E65735F74797065:6469737275707465645F696E5F736368697A6F706872656E69615F31</setSpec>
      <setName>Subject = bioinformatics: genomics and proteomics: genetics &amp; nucleic acid processing: DNA, RNA structure, function, modification: genes, structure and function: genes: types: disrupted-in-schizophrenia 1</setName>
-- <setSpec>7375626A656374733D42696F696E666F726D6174696373:47656E6F6D696373:47656E6574696373:444E415F7374727563747572655F616E645F6D6F64696669636174696F6E:67656E65735F7374727563747572655F66756E6374696F6E:67656E65735F74797065:696D6D6564696174655F6561726C795F67656E6573</setSpec>
      <setName>Subject = bioinformatics: genomics and proteomics: genetics &amp; nucleic acid processing: DNA, RNA structure, function, modification: genes, structure and function: genes: types: immediate early genes</setName>
-- <setSpec>7375626A656374733D42696F696E666F726D6174696373:47656E6F6D696373:47656E6574696373:444E415F7374727563747572655F616E645F6D6F64696669636174696F6E:67656E65735F7374727563747572655F66756E6374696F6E:67656E655F726567756C6174696F6E:68657465726F64696D6572697A6174696F6E:68657465726F64696D6572</setSpec>
      <setName>Subject = bioinformatics: genomics and proteomics: genetics &amp; nucleic acid processing: DNA, RNA structure, function, modification: genes, structure and function: gene regulation: heterodimerization: heterodimer</setName>

It appears that due to the verbosity of MeSH headings, which we are using for subjects, has exceeded WorldCat Digital Collection Gateway capacity.

Can anyone suggest a workaround?
Has anyone created a setSpec encode or decode code that will keep the values shorter?

Thanks

Tom


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ecs.soton.ac.uk/pipermail/eprints-tech/attachments/20170324/6c343cd1/attachment.html