[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[EP-tech] digital preservation - indexing errors
- Subject: [EP-tech] digital preservation - indexing errors
- From: Tomasz.Neugebauer at concordia.ca (Tomasz Neugebauer)
- Date: Thu, 30 Jul 2015 15:42:55 +0000
We just completed an upgrade of our repository, which includes a re-indexing phase of all the contents.
It was a good opportunity to take note of the errors that come up during indexing.
Here is a list of the common errors that occurred during indexing:
1. Error: Illegal entry in bfrange block in ToUnicode CMap
2. Error: Invalid Font Weight
3. Error (##): Illegal character <##> in hex string
4. Error: Can't create transform
5. Error: Couldn't link the profiles
There are also some of these:
6.
Use of uninitialized value $data in substr at /opt/eprints3/tools/../perl_lib/Text/Extract/Word.pm line 68.
Use of uninitialized value $magic in numeric eq (==) at /opt/eprints3/tools/../perl_lib/Text/Extract/Word.pm line 69.
Use of uninitialized value $magic in sprintf at /opt/eprints3/tools/../perl_lib/Text/Extract/Word.pm line 69.
This does not seem to be a Word document, but it is pretending to be one: 0 at /opt/eprints3/tools/doc2txt line 68
Error 255 from doc2txt command: [...]
Error #1 and #3 look to be the most common.
Have you encountered these types of indexing errors?
How serious are they in terms of digital preservation?
Do you use any specific strategies/workflows for dealing with these?
Do the EPrints preservation (http://files.eprints.org/696/) plugins help with identifying/solving these issues?
Thanks for any comments/suggestions about this.
Tomasz
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ecs.soton.ac.uk/pipermail/eprints-tech/attachments/20150730/11b66955/attachment.html